II-SDV 2014 Analysing Patent Full Text – Comparison against analysis of abstract and bibliographic data, and lessons learned (Richard Gynn - LexisNexis, UK)
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...Dr. Haxel Consult
Contenu connexe
Similaire à II-SDV 2014 Analysing Patent Full Text – Comparison against analysis of abstract and bibliographic data, and lessons learned (Richard Gynn - LexisNexis, UK)
Similaire à II-SDV 2014 Analysing Patent Full Text – Comparison against analysis of abstract and bibliographic data, and lessons learned (Richard Gynn - LexisNexis, UK) (20)
Exploring iOS App Development: Simplifying the Process
II-SDV 2014 Analysing Patent Full Text – Comparison against analysis of abstract and bibliographic data, and lessons learned (Richard Gynn - LexisNexis, UK)
2. Analyzing Patent Full-Text
A Study
2 April 7, 2014
Agenda
1) Full Text Availability
2) Analyzing full text
- Discussion/considerations
- Big picture analysis
- Detailed analysis - Study
3) Conclusions
Full Text content available from vendors has evolved to a point
where most of the top publishing authorities are readily available.
4. Full Text Availability – Top 10 Publishing Authorities (available from most big vendors)
April 7, 2014
Analyzing Patent Full-Text
A Study
4
China, Korea, Japan are not
the big deal they used to be!
Text can be available to analyse in English
5. Full Text Availability – Authorities available from at least one vendor
April 7, 2014
Analyzing Patent Full-Text
A Study
5
6. Full Text Availability by volume- > 100k publications
April 7, 2014
Analyzing Patent Full-Text
A Study
6
0
5
10
15
20
25
JP
US
CN
DE
EP
KR
GB
FR
WO
CA
AU
TW
SU
ES
AT
SE
IT
RU
CH
NL
BE
FI
BR
DK
IN
NO
PL
IL
DD
ZA
MX
HU
PT
CS
AR
IE
NZ
CZ
GR
Millions
7. Full Text Availability by volume- > 100k publications
April 7, 2014
Analyzing Patent Full-Text
A Study
7
0
5
10
15
20
25
JP
US
CN
DE
EP
KR
GB
FR
WO
CA
AU
TW
SU
ES
AT
SE
IT
RU
CH
NL
BE
FI
BR
DK
IN
NO
PL
IL
DD
ZA
MX
HU
PT
CS
AR
IE
NZ
CZ
GR
Millions
31 of these 39 are currently
available from vendors
Account for vast majority of total volume
8. Full Text Availability by volume - < 100k publications
April 7, 2014
Analyzing Patent Full-Text
A Study
8
0
10,000
20,000
30,000
40,000
50,000
60,000
70,000
80,000
90,000
100,000
HK
YU
RO
SG
TR
MY
LU
BG
PH
UA
TH
CL
EA
ID
HR
SK
CO
SI
VN
PE
UY
OA
EG
IS
EC
9. Full Text Availability by volume - < 100k publications
April 7, 2014
Analyzing Patent Full-Text
A Study
9
0
10,000
20,000
30,000
40,000
50,000
60,000
70,000
80,000
90,000
100,000
HK
YU
RO
SG
TR
MY
LU
BG
PH
UA
TH
CL
EA
ID
HR
SK
CO
SI
VN
PE
UY
OA
EG
IS
EC
Much smaller amounts currently
available from vendors ~ 300,000
If all were to become available would add about 1.5% to full text
that is currently available, e.g. equivalent to Spain or Taiwan
10. Full Text Availability by volume - < 10k publications
April 7, 2014
Analyzing Patent Full-Text
A Study
10
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
9,000
10,000
MA
AP
VE
EE
LV
GT
CU
LT
MD
CR
PA
CY
DO
MC
ZM
ZW
SV
SM
JO
PY
GE
DZ
KE
MT
HN
MW
NI
ME
TJ
GC
BO
MN
BA
KZ
BY
TT
11. Full Text Availability by volume - < 10k publications
April 7, 2014
Analyzing Patent Full-Text
A Study
11
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
9,000
10,000
MA
AP
VE
EE
LV
GT
CU
LT
MD
CR
PA
CY
DO
MC
ZM
ZW
SV
SM
JO
PY
GE
DZ
KE
MT
HN
MW
NI
ME
TJ
GC
BO
MN
BA
KZ
BY
TT
One currently available from vendors
In total these would add about 0.1% to full text that is currently available
12. Analyzing Patent Full-Text
A Study
12 April 7, 2014
• Are we nearly there yet?
• There’s a lot of full text available to make use
• Most vendors have a significant volumes
• Rapidly diminishing returns for each authority added
Full Text Availability
Bringing You The World
• We are already in a good place
• In terms of % availability at least
14. Analyzing Patent Full-Text
A Study
14 April 7, 2014
Full Text – What Is It?
Full-text – what is it?
• Everything of course?!
― …will concentrate on:
15. Considerations
April 7, 2014
Analyzing Patent Full-Text
A Study
15
There’s clearly a lot out
there, so why don’t we see
so much analysis of patent
full text?
16. Analyzing Patent Full-Text
A Study
16 April 7, 2014
Considerations - Language
• Can only compare like for like in same language
…non-Latin character issues too
• Noise – Patent full-text likes to state things like
…the complete opposite of what it’s about!
Considerations - Language
How I might introduce myself
…If I was a patent!
나는 사람들이 밥, 앤드류, 데이브 앨런 같은
이름이, 이름이. 나는 밥, 앤드류, 데이브 나
앨런 아니에요. 내 이름은 리처드입니다
I have a name, people have names like
Bob, Andrew, Dave and Alan. I’m not
Bob, Andrew, Dave or Alan.
My name is Richard
私は人々がボブ、アンドリュー、デイブとアラ私は人々がボブ、アンドリュー、デイブとアラ私は人々がボブ、アンドリュー、デイブとアラ私は人々がボブ、アンドリュー、デイブとアラ
ンのような名前を持っている、名前を持っていンのような名前を持っている、名前を持っていンのような名前を持っている、名前を持っていンのような名前を持っている、名前を持ってい
ます。私はボブ、アンドリュー、デイブかアランます。私はボブ、アンドリュー、デイブかアランます。私はボブ、アンドリュー、デイブかアランます。私はボブ、アンドリュー、デイブかアラン
ないよ。私ないよ。私ないよ。私ないよ。私の名の名の名の名前はリチャードです前はリチャードです前はリチャードです前はリチャードです
17. Considerations
Other Considerations:
• Massive amounts of data
– Time?
– How deal with ?
• Will it contain anything useful?
/benefit outweigh effort?
April 7, 2014
Analyzing Patent Full-Text
A Study
17
• Tools
– Big picture?
– Details?
18. Big Picture - Landscape Analysis
April 7, 2014
Analyzing Patent Full-Text
A Study
18
Big picture, topographic mapping (Discussion)
Here more full text could provide:
• Broader country analysis (often full-text not available)
• More consistency across authorities – e.g. more claims
― Compare like for like, e.g. not claims, title & abstract against title
• Full text more useful for details
• Themes/commonalities easier to
find using claims, title, abstract
• Whilst useful, vast majority of
landscape analysis done elsewhere,
…i.e. details rather than big picture
20. The Details - Study
Detailed analysis – looking for what?
• New/emerging, different
• Competitive/market comparisons
• Strength, weakness, opportunity, threat
April 7, 2014
Analyzing Patent Full-Text
A Study
20
What can I find using the full
text that I couldn’t using title,
abstract and bibliography?
21. The Details - The Technology
April 7, 2014
Analyzing Patent Full-Text
A Study
21
Terahertz analysis, e.g. imaging, spectroscopy?
Terahertz radiation - between Infra-red and microwave
22. The Details - The Search
April 7, 2014
Analyzing Patent Full-Text
A Study
22
• Broad Strategy
― Analysis IPCs + Terahertz
Radiation Synonyms
― Keyword Terahertz
Imaging & Spectroscopy
5,955 documents/3,365 families
23. Study - PatentOptimizer
Analyzing Patent Full-Text
A Study
23 April 7, 2014
Analysis Details:
• Small/emerging areas of 6-7 families
• Look at terms & phrases, parts, claim
elements (all numbers represent families)
PatentOptimizer™ Analysis of EP, PCT & US results
• English Translations
24. PatentOptimizer – Terms & Phrases
April 7, 2014
Analyzing Patent Full-Text
A Study
24
Diagnosis - General
25. PatentOptimizer – Terms & Phrases
April 7, 2014
Analyzing Patent Full-Text
A Study
25
Not found in Title, Abstract (or claims) –
All From Spectral Image Inc
Learned – Something seemingly unique to them
SAME DOCUMENTS
26. PatentOptimizer – Terms & Phrases
April 7, 2014
Analyzing Patent Full-Text
A Study
26
Not found in Title, Abstract (or claims) – All
monitoring vitamin K concentration in blood
Learned – A more recent (emerging?) use
Diagnosis - General
27. PatentOptimizer – Parts
April 7, 2014
Analyzing Patent Full-Text
A Study
27
Remote monitoring, e.g. of Bluetooth® headset user
Learned – Interesting, but not massively relevant result, would like to
investigate applications further
Diagnosis - general
28. PatentOptimizer – Claim Elements
April 7, 2014
Analyzing Patent Full-Text
A Study
28
Looking for infiltration or extravasation
during intravenous infusion
Learned – New possibly interesting area, seemingly
dominated by one organisation
Diagnosis – general
A61M – introducing remedies
29. Study - VantagePoint
Analyzing Patent Full-Text
A Study
29 April 7, 2014
Analysis Details:
• Data Statistics
• Terms uniquely appearing in full text
• Highly occurring terms used in small
numbers of documents
• Investigate terms unique to 2013
priority onward
Vantage Point Analysis of TotalPatent full text results
• English Translations
30. Vantage Point - Statistics
Very low percent of terms and words, available for
analysis are actually in the title and abstract
Title &
Abstract
• 42,614 words &
phrases
• 16,251 words
Claims
• ~132k words and
phrases not in
Title or Abstract
• ~44k words in
Title or Abstract
Full-text
• ~1.3M unique
words & phrases
• ~650k unique
words
April 7, 2014
Analyzing Patent Full-Text
A Study
30
31. Vantage Point – Terms only appearing in full text 2013 onwards
April 7, 2014
Analyzing Patent Full-Text
A Study
31
32. Vantage Point – Terms only appearing in full text 2013 onwards
April 7, 2014
Analyzing Patent Full-Text
A Study
32
Detection of tetracycline drug –
concern in resistance to antibiotics
Learned – New area (clearer language in full-text)
optical investigation
33. Vantage Point – Terms only appearing in full text 2013 onwards
April 7, 2014
Analyzing Patent Full-Text
A Study
33
Looking for gas hydrates (fracking)
Learned – New area (uncovered by more consistent
repetition in full text)
general investigation,
sampling
35. Findings
April 7, 2014
Analyzing Patent Full-Text
A Study
35
• Full text useful
• Claims less so (in this case)
Most words and phrases in the “full text”,
did not appear in Abstract & Title
• Text mined wasn’t necessarily applications, but pointed towards
• More consistent repetition in full text
Helped mainly find new/niche applications
• Probably wouldn’t have found other ways
Interesting companies & technologies to
look at further
36. Conclusions
Conclusions (Noise and huge amounts of info):
• Background did not really come in as an issue
• Used English translations to avoid language issues
• Most noise was from search results
• My judgement – about 50% proved somewhat
interesting upon further investigation
• Can this be automated/put into a process?
• 4/5+ family groupings seems to be about the
sweet spot
April 7, 2014
Analyzing Patent Full-Text
A Study
36
37. What More?
What more?
Further this:
• Life Sciences
• Define processes
Dedicated
machine?
• Detailed full-
text analysis
Study analysis
of parts
• Sellers,
inventors,
manufacturers
etc.
April 7, 2014
Analyzing Patent Full-Text
A Study
37
Easier than expected
More possible & better timescales
40. PatentOptimizer – Terms & Phrases
April 7, 2014
Analyzing Patent Full-Text
A Study
40
2 of 6 have tattoo in Abstract OR Title
(same if include claims)
Learned – THz radiation can be used for tattoo removal
Diagnosis, surgery - General
41. PatentOptimizer – Terms & Phrases
April 7, 2014
Analyzing Patent Full-Text
A Study
41
Not found in Abstract & Title
(One claimed -Optical Diagnostics)
Determining microorganism
presence/kind
42. PatentOptimizer – Claim Elements
April 7, 2014
Analyzing Patent Full-Text
A Study
42
SAME DOCUMENTS
Identifying/determining antimocrobial
resistance of Burkholderia Cepacia
Learned – Smaller more niche areas?
43. PatentOptimizer – Terms & Phrases
April 7, 2014
Analyzing Patent Full-Text
A Study
43
Not found in Title, Abstract (or claims) – All Some
detectors, some looking for heavy metal contamination
Learned – Some areas to investigate further?
44. PatentOptimizer – Claim Elements
April 7, 2014
Analyzing Patent Full-Text
A Study
44
Glucose Monitoring – Far-IR (5/7 have in Abstract & Title)
Learned – Not much more than from Title & Abstract
Blood measurement