A puzzle in Boolean

From: Stephen Adams (stevea@magister.co.uk)
Date: Wed Mar 01 2000 - 06:30:58 EST


Dear all:
For those of you with a wet Friday afternoon to while away (!), I'd like
to propose a puzzle for your edification.

I have encountered situations in the past where I need a novelty search
on a particular substructure, which is likely to result in large numbers
of hit compounds. Under these circumstances, my customers have
sometimes requested that I provide ONLY the patent references for any
compounds which have been cited in both patents and non-patents. The
assumption is, that the patent is likely to be the earliest disclosure
and there is no point in wading through large numbers of non-patent
references if you already have the patent(s).
The problem arises if you get a non-patent reference which cites both a
well-known compound and a new one. You can't use a simple logic like
"and"-ing P/DT because you'd lose the relevant non-patent reference.

My question is - has anyone found a way of doing this easily? The
following is an annotated general scheme based on STN which does work,
but I'm not convinced that it's the most elegant solution.

Aim : Given a hit set L1 in the Registry file, split this into 3 parts.
Part (a) comprises compounds only cited in patents ("patent uniques").
Part (b) comprises compounds only cited in non-patents ("non-patent
uniques"). Part (c) comprises compounds cited in both the patent and
non-patent literature.
End result : Three sets of bibliographic references, consisting of (i)
the patent references citing set (a) compounds, (ii) the non-patent
references citing set (b) compounds and (iii) the patent references only
for set (c) compounds.

Strategy:

L1 = set of compounds in Reg from sub-structure search.
File CAplus
S L1 (= L2)
S L2 AND P/DT (= L3)
S L2 NOT L3 (= L4)
S HIT RN L4 1-x
File REG
S E1-Ex (= L5) - compounds from L1 which are cited in one or more
non-patent references, but which may also be cited in one or more patent
references.
File CAPlus
S L5 (=L6) - all references citing L1 compounds which appear in one or
more non-patents
S L6 AND P/DT (= L7) - patents citing compounds which appear also in one
or more non-patents
S HIT RN L7 1-y - adds new E# terms from (x+1) upwards
File REG
S E(x+1)-E(end) (= L8) - compounds cited in one or more non-patent AND
in one or more patent i.e. the intended set (c) defined above
S L1 NOT L8 (= L9) - compounds cited in one or more non-patent OR one or
more patent BUT NOT in both i.e. a mixture of sets (a) and (b) defined
above.
File CAPlus
S L9 (= L10)
S L10 AND P/DT (= L11) - references citing compounds which appear in one
or more patents (i.e. set (i) references for patent uniques)
S L10 NOT L11 (= L12) - references citing compounds which appear in one
or more non-patents (i.e. set (ii) references for non-patent uniques)
S L8 AND P/DT (= L13) - patent references only for set (c) compounds
i.e. set (iii) references.

A partial proof of this is if you do a S HIT RN on set L12, re-search
those E-numbers in the CAPlus file and "AND" with P/DT - the answer is
zero, which is what you would expect if every compound in L12 is a
non-patent unique.

Do any of you super-searchers out there have any comments? (Any
particularly embarrasing goofs to be notified to my private e-mail only,
please!!)
Second question - if this does work, would STN consider writing us a
macro to do it more easily?

--
Stephen Adams
Magister Ltd.
Crown House, 231 Kings Road, Reading, RG1 4LS, GB
Tel: +44 (0)118 929 9515
Fax: +44 (0)118 929 9516
e-mail:  stevea@magister.co.uk

Registered in England and Wales. Company No. 3407685 Registered address : Canada House, 272 Field End Road, Eastcote, Ruislip, Middlesex HA4 9NA.



This archive was generated by hypermail 2b29 : Fri Aug 10 2001 - 15:58:12 EDT