SlideShare une entreprise Scribd logo
1  sur  15
Babies and Bathwater
Keeping linguistics alongside machine
learning in patent search
David Woolls – CFL Software Limited, UK
Matter
• Therefore, we cannot think that matter is made of points
without extension, because no matter how many of these we
manage to put together, we never obtain something with an
extended dimension.
Carlo Rovelli , Reality is not what it seems (2016 p:12)
• Quindi non si può pensare che la materia sia fatta di punti
senza estensione, perché, per quanti ne mettessimo
insieme, non otterremmo mai qualcosa con una dimensione
estesa.
• What is the matter with this sentence? Does this matter? As
a matter of fact it does. That’s another matter.
• What does ‘matter’ mean on this page?
Imagined Readers – Text differences
"It was a dark and stormy night, the rain
came down in torrents, there were brigands on
the mountains, and wolves, and the chief of the
brigands said to Antonio, 'I'm bored - tell us a
story!’”
Janet and Allan Ahlberg
From “Paul Clifford”
LSTM and linguistics
• But there are also cases where we need more
context.
• Consider trying to predict the last word in the text “I
grew up in France… I speak fluent French.”
Humans usually provide linguistic assistance in the form of function words
(grammar)
I grew up in France so I speak fluent … Definitely French
I grew up in France and I speak fluent … Possibly French but maybe another
I grew up in France but I speak fluent … Definitely not French
I grew up in France but I also speak fluent … Very definitely not French
I grew up in France but I don’t speak fluent … Definitely French
I grew up in France so I don’t speak fluent … Definitely not French
Babies, bathwater,
stems, lemmas and function words
Becomes
I think Christoph is brilliant Think Christoph brilli
I thought Christoph was brilliant Think Christoph brilli
I thought Christoph was brilliant but now I’m not
so sure.
Think Christoph brilli sure
Hearing Christoph’s brilliance I asked him to
speak.
Hear Christoph brilli ask speak
I wouldn’t do that if I were you! !
This is called telegraphic language and is spoken by children between 18
months and three years old during language acquisition. Perhaps not ideal for
computers and comprehension.
Linguistic LSTM with real sentences.
• It is a truth universally acknowledged, [6]
• that a single man [4]
• in possession of a good fortune, [6]
• must be in want of a wife. [7]
• [23/4] = 6
The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for
Processing Information
by George A. Miller
originally published in The Psychological Review, 1956, vol. 63, pp. 81-97
http://www.musanim.com/miller1956/
It is a truth universally acknowledged, that a single man in possession
of a good fortune, must be in want of a wife.
LSTM
• However little known the feelings or views of such a man may be
on his first entering a neighbourhood, this truth is so well fixed in
the minds of the surrounding families, that he is considered the
rightful property of some one or other of their daughters.
• However little known the feelings or views [7]
• of such a man may be [7]
• on his first entering a neighbourhood, [6]
• this truth is so well fixed [6]
• in the minds of the surrounding families, [7]
• that he is considered the rightful property [7]
• of some one or other of their daughters. [7]
• [47/7] = 7
Why linguistics?
• Patents are communicative documents, written in many languages.
• Communication is achieved by context which can be close or distant.
• Boolean searching gives results by document; range searching needs to be done
by claim.
• There are distractor numbers in a claim (e.g. Claim numbers, temperatures,
lengths).
• There are potential data quality or format problems introduced by OCR, machine
translation
or extraction from a database.
• All these and others need to be taken into account to find only relevant
material.
ICIC 2017 8
Why linguistics for ranges?
• Range information is in the unstructured text
– The location and referent of ranges is signalled by linguistic structures and forms:
• Range then element or Element then range or both 0,80 < Si < 1,20
• Elements by symbol Si or in full Silicon or silicon
• Implicit or explicit marking: 1-5 or between 1 and 5
• Symbolic or lexical marking: <2.5 or less than 2.5, ≥ .76 or greater than or equal to 0.76
• Variation in proximity of additional markings 0.5%, 0.5wt%, 0.5 wt %
– There can be mixtures of these forms in a single claim.
ICIC 2017 9
Reading
The program is a linear text reader because we need to:
1. Identify claims
2. Identify pairs of elements and ranges in each claim.
So each line in the file is read word by word just once in the
same sequence as a human reader.
ICIC 2017 10
Reading
• Items are identified as numbers, range indicators or elements in sequence.
• As each element/range pair is identified, the relationship with the specification is
calculated.
• Following calculation the element and the range is colour-coded and the claim
built for potential display.
• At the conclusion of each claim the total found is compared with the total
specification.
• If the claim meets the overall specification requirement it is added to the list for
display.
• At the conclusion of the reading process, all the results are ranked and
displayed.
• The program can process the full claims of around 300 patents per second.
ICIC 2017 11
Native languages v Machine Translation
ICIC 2017 12
Here is the problem from the PatBase collection.
<Claims><![CDATA[<CLA_MT><XXC1> <p> CN 1. A non-magnetic alloy of high strength and toughness,
characterized in that the chemical composition in weight percent of: C:.. 0 14 ~0 30 percent, Si:.. 0 15 ~0 80
percent,.. Mn: 20 00 ~27 00 percent; Ni:.. 0 60 ~2 00 percent; Cr:.. 12 50 ~19 00 percent;
</CLA_MT><CLA_CN><XXC1><p>CN 1. 一种高强度韧性无磁合金,其特征在于,化学成分重量百分数为: C
:0. 14 〜0. 30%, Si :0. 15 〜0. 80%, Mn :20. 00 〜27. 00% ; Ni :0. 60 〜2. 00% ; Cr :12. 50 〜19. 00%
;
You can see that the MT version into English is appalling!.
You can also see that the original claim will be understandable by the program because the presentation is clear.
Detailed example (continued)
ICIC 2017 13
It is not practicable to write a program that takes account of all the things that might go wrong, without also
introducing potential errors to data that is actually ok. But it is possible for SpanMatch to recognise the original
as correct as you see here.
So, given clean data or cleaning the data up as best we can, we can do this in all the languages. Once you have an
indication of potential interest you can use a good MT program to translate just the claims of interest.
This is Google Translate translating the claim, and you can see that it is struggling, but is better than the PatBase
one.
CN is a high strength toughness nonmagnetic alloy characterized in that the chemical composition is in a weight
percentage of C: 0.14 to 0. 30 Si: 0.015 to 80 Mn: 20 to 0000. Ni: 0.60 ~ 2.00; Cr: 12. 50 ~ 1900; Mo or W
elements of one or two: 0. 60 ~ 2.50 ;; 0.8 ~ [0. LXMn (% - 0.5); 0 20 to 0.50; Ca, rare earth elements of one or
two: 0. 003 ~ 0.05;: 彡 0.03:: 彡 0.03; Fe: balance.
Use of CN, JP, KR originals - rationale
• Machine translation is often hard to understand and sometimes incomprehensible
• Using native language patents ensures data quality
• Limited inbuilt knowledge required for numerical searching
– Searching for elements requires only that a program has the CJK equivalents
for full element names; international symbols are identical.
– Searching for ranges requires knowledge of potential CJK equivalent codes
for digits
– Searching for range indicators requires language specific identification of hyphen, <,
> and words.
• Accurate identification of the search specification with display of the claims means only
those claims of interest need translation by machine or human
ICIC 2017 14
Thank you
Contact: d.woolls@cflsoftware.com
Website: www.cflsoftware.com

Contenu connexe

Similaire à ICIC 2017: Babies and bathwater: Keeping linguistics alongside machine learning in patent search

Sequencing run grief counseling: counting kmers at MG-RAST
Sequencing run grief counseling: counting kmers at MG-RASTSequencing run grief counseling: counting kmers at MG-RAST
Sequencing run grief counseling: counting kmers at MG-RASTwltrimbl
 
Exploiting Loopholes in CAP
Exploiting Loopholes in CAPExploiting Loopholes in CAP
Exploiting Loopholes in CAPC4Media
 
Serge astm-presentation-chicago-2014-final
Serge astm-presentation-chicago-2014-finalSerge astm-presentation-chicago-2014-final
Serge astm-presentation-chicago-2014-finalSerge Gladkoff
 
TDC 2020 - Implementing a Mini-Language
TDC 2020 - Implementing a Mini-LanguageTDC 2020 - Implementing a Mini-Language
TDC 2020 - Implementing a Mini-LanguageLuciano Sabença
 
DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0
DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0
DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0Plain Concepts
 
2013 siam-cse-big-data
2013 siam-cse-big-data2013 siam-cse-big-data
2013 siam-cse-big-datac.titus.brown
 
NLP_guest_lecture.pdf
NLP_guest_lecture.pdfNLP_guest_lecture.pdf
NLP_guest_lecture.pdfSoha82
 
Babak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entitiesBabak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entitiesZoltan Varju
 
Should we be afraid of Transformers?
Should we be afraid of Transformers?Should we be afraid of Transformers?
Should we be afraid of Transformers?Dominik Seisser
 
02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysisSubhas Kumar Ghosh
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingTed Xiao
 
Get full visibility and find hidden security issues
Get full visibility and find hidden security issuesGet full visibility and find hidden security issues
Get full visibility and find hidden security issuesElasticsearch
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPMENGSAYLOEM1
 
QuadIron An open source library for number theoretic transform-based erasure ...
QuadIron An open source library for number theoretic transform-based erasure ...QuadIron An open source library for number theoretic transform-based erasure ...
QuadIron An open source library for number theoretic transform-based erasure ...Scality
 
HyperLogLog Intuition Without Hard Math
HyperLogLog Intuition Without Hard MathHyperLogLog Intuition Without Hard Math
HyperLogLog Intuition Without Hard MathSimeon Simeonov
 
70 C o m m u n i C at i o n s o f t h E a C m j u.docx
70    C o m m u n i C at i o n s  o f  t h E  a C m       j u.docx70    C o m m u n i C at i o n s  o f  t h E  a C m       j u.docx
70 C o m m u n i C at i o n s o f t h E a C m j u.docxevonnehoggarth79783
 
KantanFest: Andy Way
KantanFest: Andy WayKantanFest: Andy Way
KantanFest: Andy Waykantanmt
 
BUS216 Exam #3 Review – SP14 1 1. In order to ha.docx
BUS216 Exam #3 Review – SP14  1  1. In order to ha.docxBUS216 Exam #3 Review – SP14  1  1. In order to ha.docx
BUS216 Exam #3 Review – SP14 1 1. In order to ha.docxRAHUL126667
 
Formidable College Supplemental Essays Th
Formidable College Supplemental Essays ThFormidable College Supplemental Essays Th
Formidable College Supplemental Essays ThMegan Mack
 

Similaire à ICIC 2017: Babies and bathwater: Keeping linguistics alongside machine learning in patent search (20)

Sequencing run grief counseling: counting kmers at MG-RAST
Sequencing run grief counseling: counting kmers at MG-RASTSequencing run grief counseling: counting kmers at MG-RAST
Sequencing run grief counseling: counting kmers at MG-RAST
 
Exploiting Loopholes in CAP
Exploiting Loopholes in CAPExploiting Loopholes in CAP
Exploiting Loopholes in CAP
 
Serge astm-presentation-chicago-2014-final
Serge astm-presentation-chicago-2014-finalSerge astm-presentation-chicago-2014-final
Serge astm-presentation-chicago-2014-final
 
Messaging
MessagingMessaging
Messaging
 
TDC 2020 - Implementing a Mini-Language
TDC 2020 - Implementing a Mini-LanguageTDC 2020 - Implementing a Mini-Language
TDC 2020 - Implementing a Mini-Language
 
DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0
DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0
DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0
 
2013 siam-cse-big-data
2013 siam-cse-big-data2013 siam-cse-big-data
2013 siam-cse-big-data
 
NLP_guest_lecture.pdf
NLP_guest_lecture.pdfNLP_guest_lecture.pdf
NLP_guest_lecture.pdf
 
Babak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entitiesBabak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entities
 
Should we be afraid of Transformers?
Should we be afraid of Transformers?Should we be afraid of Transformers?
Should we be afraid of Transformers?
 
02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
Get full visibility and find hidden security issues
Get full visibility and find hidden security issuesGet full visibility and find hidden security issues
Get full visibility and find hidden security issues
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLP
 
QuadIron An open source library for number theoretic transform-based erasure ...
QuadIron An open source library for number theoretic transform-based erasure ...QuadIron An open source library for number theoretic transform-based erasure ...
QuadIron An open source library for number theoretic transform-based erasure ...
 
HyperLogLog Intuition Without Hard Math
HyperLogLog Intuition Without Hard MathHyperLogLog Intuition Without Hard Math
HyperLogLog Intuition Without Hard Math
 
70 C o m m u n i C at i o n s o f t h E a C m j u.docx
70    C o m m u n i C at i o n s  o f  t h E  a C m       j u.docx70    C o m m u n i C at i o n s  o f  t h E  a C m       j u.docx
70 C o m m u n i C at i o n s o f t h E a C m j u.docx
 
KantanFest: Andy Way
KantanFest: Andy WayKantanFest: Andy Way
KantanFest: Andy Way
 
BUS216 Exam #3 Review – SP14 1 1. In order to ha.docx
BUS216 Exam #3 Review – SP14  1  1. In order to ha.docxBUS216 Exam #3 Review – SP14  1  1. In order to ha.docx
BUS216 Exam #3 Review – SP14 1 1. In order to ha.docx
 
Formidable College Supplemental Essays Th
Formidable College Supplemental Essays ThFormidable College Supplemental Essays Th
Formidable College Supplemental Essays Th
 

Plus de Dr. Haxel Consult

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementDr. Haxel Consult
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...Dr. Haxel Consult
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...Dr. Haxel Consult
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...Dr. Haxel Consult
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...Dr. Haxel Consult
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...Dr. Haxel Consult
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...Dr. Haxel Consult
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...Dr. Haxel Consult
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...Dr. Haxel Consult
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...Dr. Haxel Consult
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...Dr. Haxel Consult
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...Dr. Haxel Consult
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterDr. Haxel Consult
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCDr. Haxel Consult
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...Dr. Haxel Consult
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...Dr. Haxel Consult
 

Plus de Dr. Haxel Consult (20)

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance Center
 
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IPAI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IP
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOC
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
 

Dernier

Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...roncy bisnoi
 
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Call Girls in Nagpur High Profile
 
Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...
Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...
Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...SUHANI PANDEY
 
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...SUHANI PANDEY
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirtrahman018755
 
Katraj ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
Katraj ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...Katraj ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...
Katraj ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...tanu pandey
 
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort ServiceBusty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort ServiceDelhi Call girls
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge GraphsEleniIlkou
 
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Delhi Call girls
 
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...nilamkumrai
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdfMatthew Sinclair
 
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)Delhi Call girls
 
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Real Escorts in Al Nahda +971524965298 Dubai Escorts Service
Real Escorts in Al Nahda +971524965298 Dubai Escorts ServiceReal Escorts in Al Nahda +971524965298 Dubai Escorts Service
Real Escorts in Al Nahda +971524965298 Dubai Escorts ServiceEscorts Call Girls
 
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...Escorts Call Girls
 
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋nirzagarg
 
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls DubaiDubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubaikojalkojal131
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...Neha Pandey
 
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...SUHANI PANDEY
 

Dernier (20)

Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
 
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
 
Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...
Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...
Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...
 
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirt
 
Katraj ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
Katraj ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...Katraj ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...
Katraj ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
 
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
 
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort ServiceBusty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
 
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
 
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
 
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
 
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
 
Real Escorts in Al Nahda +971524965298 Dubai Escorts Service
Real Escorts in Al Nahda +971524965298 Dubai Escorts ServiceReal Escorts in Al Nahda +971524965298 Dubai Escorts Service
Real Escorts in Al Nahda +971524965298 Dubai Escorts Service
 
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
 
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
 
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls DubaiDubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
 
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
 

ICIC 2017: Babies and bathwater: Keeping linguistics alongside machine learning in patent search

  • 1. Babies and Bathwater Keeping linguistics alongside machine learning in patent search David Woolls – CFL Software Limited, UK
  • 2. Matter • Therefore, we cannot think that matter is made of points without extension, because no matter how many of these we manage to put together, we never obtain something with an extended dimension. Carlo Rovelli , Reality is not what it seems (2016 p:12) • Quindi non si può pensare che la materia sia fatta di punti senza estensione, perché, per quanti ne mettessimo insieme, non otterremmo mai qualcosa con una dimensione estesa. • What is the matter with this sentence? Does this matter? As a matter of fact it does. That’s another matter. • What does ‘matter’ mean on this page?
  • 3. Imagined Readers – Text differences "It was a dark and stormy night, the rain came down in torrents, there were brigands on the mountains, and wolves, and the chief of the brigands said to Antonio, 'I'm bored - tell us a story!’” Janet and Allan Ahlberg From “Paul Clifford”
  • 4. LSTM and linguistics • But there are also cases where we need more context. • Consider trying to predict the last word in the text “I grew up in France… I speak fluent French.” Humans usually provide linguistic assistance in the form of function words (grammar) I grew up in France so I speak fluent … Definitely French I grew up in France and I speak fluent … Possibly French but maybe another I grew up in France but I speak fluent … Definitely not French I grew up in France but I also speak fluent … Very definitely not French I grew up in France but I don’t speak fluent … Definitely French I grew up in France so I don’t speak fluent … Definitely not French
  • 5. Babies, bathwater, stems, lemmas and function words Becomes I think Christoph is brilliant Think Christoph brilli I thought Christoph was brilliant Think Christoph brilli I thought Christoph was brilliant but now I’m not so sure. Think Christoph brilli sure Hearing Christoph’s brilliance I asked him to speak. Hear Christoph brilli ask speak I wouldn’t do that if I were you! ! This is called telegraphic language and is spoken by children between 18 months and three years old during language acquisition. Perhaps not ideal for computers and comprehension.
  • 6. Linguistic LSTM with real sentences. • It is a truth universally acknowledged, [6] • that a single man [4] • in possession of a good fortune, [6] • must be in want of a wife. [7] • [23/4] = 6 The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information by George A. Miller originally published in The Psychological Review, 1956, vol. 63, pp. 81-97 http://www.musanim.com/miller1956/ It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife.
  • 7. LSTM • However little known the feelings or views of such a man may be on his first entering a neighbourhood, this truth is so well fixed in the minds of the surrounding families, that he is considered the rightful property of some one or other of their daughters. • However little known the feelings or views [7] • of such a man may be [7] • on his first entering a neighbourhood, [6] • this truth is so well fixed [6] • in the minds of the surrounding families, [7] • that he is considered the rightful property [7] • of some one or other of their daughters. [7] • [47/7] = 7
  • 8. Why linguistics? • Patents are communicative documents, written in many languages. • Communication is achieved by context which can be close or distant. • Boolean searching gives results by document; range searching needs to be done by claim. • There are distractor numbers in a claim (e.g. Claim numbers, temperatures, lengths). • There are potential data quality or format problems introduced by OCR, machine translation or extraction from a database. • All these and others need to be taken into account to find only relevant material. ICIC 2017 8
  • 9. Why linguistics for ranges? • Range information is in the unstructured text – The location and referent of ranges is signalled by linguistic structures and forms: • Range then element or Element then range or both 0,80 < Si < 1,20 • Elements by symbol Si or in full Silicon or silicon • Implicit or explicit marking: 1-5 or between 1 and 5 • Symbolic or lexical marking: <2.5 or less than 2.5, ≥ .76 or greater than or equal to 0.76 • Variation in proximity of additional markings 0.5%, 0.5wt%, 0.5 wt % – There can be mixtures of these forms in a single claim. ICIC 2017 9
  • 10. Reading The program is a linear text reader because we need to: 1. Identify claims 2. Identify pairs of elements and ranges in each claim. So each line in the file is read word by word just once in the same sequence as a human reader. ICIC 2017 10
  • 11. Reading • Items are identified as numbers, range indicators or elements in sequence. • As each element/range pair is identified, the relationship with the specification is calculated. • Following calculation the element and the range is colour-coded and the claim built for potential display. • At the conclusion of each claim the total found is compared with the total specification. • If the claim meets the overall specification requirement it is added to the list for display. • At the conclusion of the reading process, all the results are ranked and displayed. • The program can process the full claims of around 300 patents per second. ICIC 2017 11
  • 12. Native languages v Machine Translation ICIC 2017 12 Here is the problem from the PatBase collection. <Claims><![CDATA[<CLA_MT><XXC1> <p> CN 1. A non-magnetic alloy of high strength and toughness, characterized in that the chemical composition in weight percent of: C:.. 0 14 ~0 30 percent, Si:.. 0 15 ~0 80 percent,.. Mn: 20 00 ~27 00 percent; Ni:.. 0 60 ~2 00 percent; Cr:.. 12 50 ~19 00 percent; </CLA_MT><CLA_CN><XXC1><p>CN 1. 一种高强度韧性无磁合金,其特征在于,化学成分重量百分数为: C :0. 14 〜0. 30%, Si :0. 15 〜0. 80%, Mn :20. 00 〜27. 00% ; Ni :0. 60 〜2. 00% ; Cr :12. 50 〜19. 00% ; You can see that the MT version into English is appalling!. You can also see that the original claim will be understandable by the program because the presentation is clear.
  • 13. Detailed example (continued) ICIC 2017 13 It is not practicable to write a program that takes account of all the things that might go wrong, without also introducing potential errors to data that is actually ok. But it is possible for SpanMatch to recognise the original as correct as you see here. So, given clean data or cleaning the data up as best we can, we can do this in all the languages. Once you have an indication of potential interest you can use a good MT program to translate just the claims of interest. This is Google Translate translating the claim, and you can see that it is struggling, but is better than the PatBase one. CN is a high strength toughness nonmagnetic alloy characterized in that the chemical composition is in a weight percentage of C: 0.14 to 0. 30 Si: 0.015 to 80 Mn: 20 to 0000. Ni: 0.60 ~ 2.00; Cr: 12. 50 ~ 1900; Mo or W elements of one or two: 0. 60 ~ 2.50 ;; 0.8 ~ [0. LXMn (% - 0.5); 0 20 to 0.50; Ca, rare earth elements of one or two: 0. 003 ~ 0.05;: 彡 0.03:: 彡 0.03; Fe: balance.
  • 14. Use of CN, JP, KR originals - rationale • Machine translation is often hard to understand and sometimes incomprehensible • Using native language patents ensures data quality • Limited inbuilt knowledge required for numerical searching – Searching for elements requires only that a program has the CJK equivalents for full element names; international symbols are identical. – Searching for ranges requires knowledge of potential CJK equivalent codes for digits – Searching for range indicators requires language specific identification of hyphen, <, > and words. • Accurate identification of the search specification with display of the claims means only those claims of interest need translation by machine or human ICIC 2017 14