Hello all again,
Sorry to keep bombarding you all on this issue. But I need to make a
clarification:
From the PCT collection of PatGenDB, we get:
892018 amino acids
3452396 nucleic acids.
So its a bit more. Than what I said in the last email. I just haven't
bothered to get the OPS and full text data. Too busy right now
responding to a tedious patent office action.
This data is an accumilation of the following sources:
WIPO
DDBJ
NCBI
EBI
Now its straight. Hope this clarifies some confusion a little.
Richard
Richard Rouse wrote:
> Hello Robert,
>
> Thanks for responding.
>
> PCTGEN is a great service no doubt. I am not knocking it.
>
> But, what actually is PCTGEN? Where actually do you get the data from?
> I have been trying to figure this out for 3 years now. Isn't
> Fitz-Karlsruhe a non profit? So its ok to say, right? Also why not
> consolidate it with DGENE?
>
> Also what I gather from the NE PIUG meeting is that
>
> PCTGEN has 0.419 million amino acids and 2.53 million nucleic acids.
>
> PatGen DB has 0.853620 million acids and 4.673988 million nucleic acids!
>
> Of course if anyone wants to pay us, we can get those crazy pubs like
> WO05-068657.zip
> <ftp://ftp.wipo.int/pub/published_pct_sequences/28.07.2005/WO05-068657.zip>.
> I just don't want to use up 7 gigs for this data. Would be happy to
> get it for you though. Then you would have way more data than PatGenDB.
>
> The WIPO site where we get it from is at:
> http://www.wipo.int/pct/en/sequences/listing.htm. It is a pain to get,
> but now we have automated the whole process. This actually is a
> product that we sell through our PatLAMP system.
>
> By the way since we sell the services of accessing data. The customer
> gets code to use to build their own databases. Its secure too!
> Remember the story about the Micropatent case -
> http://patentinformatics.fdns.net/PDF/digital_thugs.pdf.
>
> Regards,
> Richard
>
> Robert Austin wrote:
>
>> Alex and PIUG colleagues,
>>
>> FIZ Karlsruhe continues to capture WIPO sequences form the PCTGEN
>> database on
>> STN, entirely regardless of the original WIPO file format, within 24
>> hours of
>> publication. We have been doing that since the PCTGEN was released
>> on STN in
>> 2003.
>>
>> PCTGEN database summary sheet:
>> http://info.cas.org/ONLINE/DBSS/pctgenss.html
>>
>> PCTGEN Workshop Manual:
>> http://www.stn-international.com/training_center/bioseq/pctgen_wm.pdf
>>
>> Regards
>>
>> Rob
>>
>> Robert Austin
>> FIZ Karlsruhe
>> Tel: 609 333 1466
>> www.fiz-k.com
>>
>>
>> -----Original Message-----
>> From: owner-piug-l_at_derwent.co.uk [mailto:owner-piug-l_at_derwent.co.uk]
>> On Behalf
>> Of Richard Rouse
>> Sent: Wednesday, October 05, 2005 11:46 AM
>> To: Alex Thurgood
>> Cc: Piug-discussion_at_piug.org; PIUG-L_at_derwent.co.uk
>> Subject: Re: Building genetic patent databases reference
>>
>>
>> Alex and PIUG colleagues,
>>
>> The .doc and .pdf files come from the WIPO server. These are examples
>> of the many problems with this system.
>>
>> Basically PatentInformatics can build a database that contains patent
>> full text from the patent publications that have been accounted for
>> in http://patentinformatics.fdns.net:81/stats/patent.php
>>
>> Additionally we can get the extra sequences that publications that
>> the MIT group have identified (Kyle Jenson, et al) Check out the Oct
>> 14 issue of Science that will be coming out.
>>
>> We can separate the sequences from the text. Which in this case the
>> sequences will be extracted from the claims as well. So you will be
>> able to separate the sequences from the claims.
>>
>> Services like parsing data and DB schemas are cool business models,
>> but hoarding data is just being greedy.
>>
>> This is one reason why it takes about 27 months to get a patent
>> issued at the USPTO since many patent applicants don't even bother to
>> pay for a search given the costs of accessing proprietory data.
>>
>> Richard
>>
>> Alex Thurgood wrote:
>>
>>
>>
>>> Le mardi 04 octobre 2005 à 09:09 -0700, Richard Rouse a écrit :
>>>
>>> Hi Richard,
>>>
>>>
>>>
>>>
>>>
>>>
>>>> Basically if you read this, you can see that WIPO does a pretty
>>>> messy job.
>>>>
>>>>
>>>>
>>>
>>> Which is superbly ironic given that patent sequence listings are
>>> supposed to conform to WIPO standard ST25, and that the submissions are
>>> supposed to be made as ASCII text files. Where did the PDFs or the DOCs
>>> come from ? I lay my money on the electronic filing process ;-)
>>>
>>> Anyway, you're right, it is a right mess. I downloaded all of the files
>>>
>>> from the WIPO server once, in the naïve hope that the zips would
>>> just be
>>
>>
>>
>>> zipped ASCII WIPO ST25 sequence listings, but soon gave up when I saw
>>> what they had included.
>>>
>>>
>>>
>>>
>>>
>>>
>>>> Here is the info:
>>>>
>>>> http://patentinformatics.fdns.net:81/stats/
>>>>
>>>>
>>>>
>>>
>>> Interesting indeed, and I imagine a great deal of work went into
>>> extracting the data in the downloads into your database.
>>>
>>>
>>>
>>>
>>>
>>>> PatGen DB is a free service. The DB schema is cool but it is not as
>>>> complete as what is being displayed here. Of course, as a service
>>>> we are prepared to get all of these sequences and keep it updated.
>>>>
>>>>
>>>
>>> I haven't looked in depth, but it would certainly be something I would
>>> mention to my bioscience clients.
>>>
>>> Regards,
>>>
>>> Alex Thurgood
>>> Cabinet Michel Richebourg
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>>
>>
>
>
Received on Thu Oct 06 2005 - 06:15:31
This archive was generated by hypermail 2.2.0 : Sat Nov 21 2009 - 07:01:07