RE: Building genetic patent databases reference

From: Robert Austin <robert.austin_at_fiz-k.com>
Date: Wed, 05 Oct 2005 12:40:43 -0400

Alex and PIUG colleagues,

FIZ Karlsruhe continues to capture WIPO sequences form the PCTGEN database on
STN, entirely regardless of the original WIPO file format, within 24 hours of
publication. We have been doing that since the PCTGEN was released on STN in
2003.

PCTGEN database summary sheet:
http://info.cas.org/ONLINE/DBSS/pctgenss.html

PCTGEN Workshop Manual:
http://www.stn-international.com/training_center/bioseq/pctgen_wm.pdf

Regards

Rob

Robert Austin
FIZ Karlsruhe
Tel: 609 333 1466
www.fiz-k.com


-----Original Message-----
From: owner-piug-l_at_derwent.co.uk [mailto:owner-piug-l_at_derwent.co.uk] On Behalf
Of Richard Rouse
Sent: Wednesday, October 05, 2005 11:46 AM
To: Alex Thurgood
Cc: Piug-discussion_at_piug.org; PIUG-L_at_derwent.co.uk
Subject: Re: Building genetic patent databases reference


Alex and PIUG colleagues,

The .doc and .pdf files come from the WIPO server. These are examples of
the many problems with this system.

Basically PatentInformatics can build a database that contains patent
full text from the patent publications that have been accounted for in
http://patentinformatics.fdns.net:81/stats/patent.php

Additionally we can get the extra sequences that publications that the
MIT group have identified (Kyle Jenson, et al) Check out the Oct 14
issue of Science that will be coming out.

We can separate the sequences from the text. Which in this case the
sequences will be extracted from the claims as well. So you will be able
to separate the sequences from the claims.

Services like parsing data and DB schemas are cool business models, but
hoarding data is just being greedy.

This is one reason why it takes about 27 months to get a patent issued
at the USPTO since many patent applicants don't even bother to pay for a
search given the costs of accessing proprietory data.

Richard

Alex Thurgood wrote:

>Le mardi 04 octobre 2005 à 09:09 -0700, Richard Rouse a écrit :
>
>Hi Richard,
>
>
>
>
>>Basically if you read this, you can see that WIPO does a pretty messy job.
>>
>>
>>
>
>Which is superbly ironic given that patent sequence listings are
>supposed to conform to WIPO standard ST25, and that the submissions are
>supposed to be made as ASCII text files. Where did the PDFs or the DOCs
>come from ? I lay my money on the electronic filing process ;-)
>
>Anyway, you're right, it is a right mess. I downloaded all of the files
>from the WIPO server once, in the naïve hope that the zips would just be
>zipped ASCII WIPO ST25 sequence listings, but soon gave up when I saw
>what they had included.
>
>
>
>
>>Here is the info:
>>
>>http://patentinformatics.fdns.net:81/stats/
>>
>>
>>
>
>Interesting indeed, and I imagine a great deal of work went into
>extracting the data in the downloads into your database.
>
>
>
>
>>PatGen DB is a free service. The DB schema is cool but it is not as
>>complete as what is being displayed here. Of course, as a service we are
>>prepared to get all of these sequences and keep it updated.
>>
>>
>
>I haven't looked in depth, but it would certainly be something I would
>mention to my bioscience clients.
>
>Regards,
>
>Alex Thurgood
>Cabinet Michel Richebourg
>
>
>
>
>


Received on Wed Oct 05 2005 - 19:00:28

This archive was generated by hypermail 2.2.0 : Sat Nov 21 2009 - 07:01:07