Re: Building genetic patent databases reference

From: Richard Rouse <rjdrouse_at_patentinformatics.com>
Date: Wed, 05 Oct 2005 21:13:49 -0700

Hi all,

Again sorry, just to clarify again.

The amino acids are 892018 and nucleic acids are 3452396. This is the
current correct number.

The initial count (853620 aa and 4411594 na) was a bad sql query.

So its a bit less on the nucleic acid side, but more on the amino acids.
Anyway we are open for an independant audit if anyone is interested.

Richard

Richard Rouse wrote:

> Hello all again,
>
> Sorry to keep bombarding you all on this issue. But I need to make a
> clarification:
>
> From the PCT collection of PatGenDB, we get:
>
> 892018 amino acids
> 3452396 nucleic acids.
>
> So its a bit more. Than what I said in the last email. I just haven't
> bothered to get the OPS and full text data. Too busy right now
> responding to a tedious patent office action.
>
> This data is an accumilation of the following sources:
>
> WIPO
> DDBJ
> NCBI
> EBI
>
>
> Now its straight. Hope this clarifies some confusion a little.
>
> Richard
>
>
> Richard Rouse wrote:
>
>> Hello Robert,
>>
>> Thanks for responding.
>>
>> PCTGEN is a great service no doubt. I am not knocking it.
>>
>> But, what actually is PCTGEN? Where actually do you get the data
>> from? I have been trying to figure this out for 3 years now. Isn't
>> Fitz-Karlsruhe a non profit? So its ok to say, right? Also why not
>> consolidate it with DGENE?
>>
>> Also what I gather from the NE PIUG meeting is that
>>
>> PCTGEN has 0.419 million amino acids and 2.53 million nucleic acids.
>>
>> PatGen DB has 0.853620 million acids and 4.673988 million nucleic acids!
>>
>> Of course if anyone wants to pay us, we can get those crazy pubs like
>> WO05-068657.zip
>> <ftp://ftp.wipo.int/pub/published_pct_sequences/28.07.2005/WO05-068657.zip>.
>> I just don't want to use up 7 gigs for this data. Would be happy to
>> get it for you though. Then you would have way more data than PatGenDB.
>>
>> The WIPO site where we get it from is at:
>> http://www.wipo.int/pct/en/sequences/listing.htm. It is a pain to
>> get, but now we have automated the whole process. This actually is a
>> product that we sell through our PatLAMP system.
>>
>> By the way since we sell the services of accessing data. The customer
>> gets code to use to build their own databases. Its secure too!
>> Remember the story about the Micropatent case -
>> http://patentinformatics.fdns.net/PDF/digital_thugs.pdf.
>>
>> Regards,
>> Richard
>>
>> Robert Austin wrote:
>>
>>> Alex and PIUG colleagues,
>>>
>>> FIZ Karlsruhe continues to capture WIPO sequences form the PCTGEN
>>> database on
>>> STN, entirely regardless of the original WIPO file format, within 24
>>> hours of
>>> publication. We have been doing that since the PCTGEN was released
>>> on STN in
>>> 2003.
>>>
>>> PCTGEN database summary sheet:
>>> http://info.cas.org/ONLINE/DBSS/pctgenss.html
>>>
>>> PCTGEN Workshop Manual:
>>> http://www.stn-international.com/training_center/bioseq/pctgen_wm.pdf
>>>
>>> Regards
>>>
>>> Rob
>>>
>>> Robert Austin
>>> FIZ Karlsruhe
>>> Tel: 609 333 1466
>>> www.fiz-k.com
>>>
>>>
>>> -----Original Message-----
>>> From: owner-piug-l_at_derwent.co.uk [mailto:owner-piug-l_at_derwent.co.uk]
>>> On Behalf
>>> Of Richard Rouse
>>> Sent: Wednesday, October 05, 2005 11:46 AM
>>> To: Alex Thurgood
>>> Cc: Piug-discussion_at_piug.org; PIUG-L_at_derwent.co.uk
>>> Subject: Re: Building genetic patent databases reference
>>>
>>>
>>> Alex and PIUG colleagues,
>>>
>>> The .doc and .pdf files come from the WIPO server. These are
>>> examples of the many problems with this system.
>>>
>>> Basically PatentInformatics can build a database that contains
>>> patent full text from the patent publications that have been
>>> accounted for in http://patentinformatics.fdns.net:81/stats/patent.php
>>>
>>> Additionally we can get the extra sequences that publications that
>>> the MIT group have identified (Kyle Jenson, et al) Check out the
>>> Oct 14 issue of Science that will be coming out.
>>>
>>> We can separate the sequences from the text. Which in this case the
>>> sequences will be extracted from the claims as well. So you will be
>>> able to separate the sequences from the claims.
>>>
>>> Services like parsing data and DB schemas are cool business models,
>>> but hoarding data is just being greedy.
>>>
>>> This is one reason why it takes about 27 months to get a patent
>>> issued at the USPTO since many patent applicants don't even bother
>>> to pay for a search given the costs of accessing proprietory data.
>>>
>>> Richard
>>>
>>> Alex Thurgood wrote:
>>>
>>>
>>>
>>>> Le mardi 04 octobre 2005 à 09:09 -0700, Richard Rouse a écrit :
>>>>
>>>> Hi Richard,
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> Basically if you read this, you can see that WIPO does a pretty
>>>>> messy job.
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> Which is superbly ironic given that patent sequence listings are
>>>> supposed to conform to WIPO standard ST25, and that the submissions
>>>> are
>>>> supposed to be made as ASCII text files. Where did the PDFs or the
>>>> DOCs
>>>> come from ? I lay my money on the electronic filing process ;-)
>>>>
>>>> Anyway, you're right, it is a right mess. I downloaded all of the
>>>> files
>>>> from the WIPO server once, in the naïve hope that the zips would
>>>> just be
>>>
>>>
>>>
>>>
>>>> zipped ASCII WIPO ST25 sequence listings, but soon gave up when I saw
>>>> what they had included.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> Here is the info:
>>>>>
>>>>> http://patentinformatics.fdns.net:81/stats/
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> Interesting indeed, and I imagine a great deal of work went into
>>>> extracting the data in the downloads into your database.
>>>>
>>>>
>>>>
>>>>
>>>>> PatGen DB is a free service. The DB schema is cool but it is not
>>>>> as complete as what is being displayed here. Of course, as a
>>>>> service we are prepared to get all of these sequences and keep it
>>>>> updated.
>>>>>
>>>>>
>>>>
>>>>
>>>> I haven't looked in depth, but it would certainly be something I would
>>>> mention to my bioscience clients.
>>>>
>>>> Regards,
>>>>
>>>> Alex Thurgood
>>>> Cabinet Michel Richebourg
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>
>


Received on Thu Oct 06 2005 - 06:30:31

This archive was generated by hypermail 2.2.0 : Sat Nov 21 2009 - 07:01:07