Domain Knowledge Powers AI with Context

Domain Knowledge makes Artificial
Intelligence Smart
Linda Andersson
AI-SDV 2022
© Artificial Researcher,2022 1

Linda Andersson
CEO/CTO Artificial Researcher IT GmbH
Awards
• 2018 Commercial Viability award by
the Austrian Angel Investors
Association.
• 2017 PhD high potential R&D award,
i2c Award, Vienna University of
Technology
• 1999 Finalist in a Venture Cup held at
Mälardalens University College.
Research fields
• Natural Language Understanding
• Natural Language Processing
• Information Extraction
• Information Retrieval
Academic merits
• Information design
1998-2001
• General Linguistics
2001-2003
• Computational Linguistics
2006-2009
• Language Engineering
2004
• Library & Information Science
2004-2006
• Computer Science
2009-2019

“There is little you can do about prior art which
wasn’t found in the search that you made before
filing the patent application.”
Barry Franks Hynell, “Opposition before the EPO”, preface to
Winning with IP: managing intellectual property today, ISBN 978-1-9998329-6-4

Information Need ”Prior Art”
No 37435 Benz Patent – Motorwagen (1886 )
Steering mechanism
Search for:
• Problem and Solution (A)
• Scope of the invention (A,B,C)
• Specific technical details (B,C)
Engine function
A
C
C
The transport vehicle
B

What make a AI Tool technical successful?
• A successful text mining solution does not only focus on developing the
technology or the best deep learning algorithm, it is much more complex.
The technology design reflects:
• Task Complexity
• Information needs, retrieve relevant paragraphs and not just documents
• Language Complexity
• Word formation of new words are particular important for the patent text
genre.
• Domain text-genre Complexity
• Multi word terms

Known item search Explorative search
Find relevant paragraphs
Scientific Summarization
ComparativeSummarization
Requests
Output example
Reading text
Different Information Needs
Task Complexity

Next generation Search Technology
Our search software generate the query
in the specific domain
Retrieves relevant paragraphs
Graph Search & Text Box Search
We have developed text mining technologies by combining traditional
Natural Language Processing and Deep Learning technologies.
Task Complexity

Example of automatic query generation
Automatic query expansion terms
Automatic Query formulation
- From text segment to automatic Boolean query
Task Complexity

Automatic Query formulation
- From terms to concept term graphs
https://graph-demo.artificialresearcher.com/
Term: environment climate fuel biofuel
(fuel OR gas OR engine OR vehicle OR oil OR coal OR energy OR hydrocarbon OR steam OR
liquid OR fluid OR furnace OR lubricant OR hydrogen OR feed) AND ("diesel fuel"~4 OR "liquid
fuel"~4 OR "fuel gas"~4 OR "fuel oil"~4 OR "fuel material"~4 OR "hydrogen gas"~4 OR "diesel
oil"~4 OR "liquid hydrocarbon"~4 OR "burning gas"~4 OR "synthetic gas"~4 OR "suitable
hydrocarbon"~4 OR "mixed gas"~4 OR "tail gas"~4)
With context similarity of 0.85
Task Complexity

Artificial Researcher-Search Solutions
Relation graph for
terms extracted
from patent and
scientific
publications
• Direct access
• Relevant paragraphs (passages)
• Abstracts
• Bibliographic information
• Links to full text documents
• Download
• Paragraphs including all bibliographic data, including
identified technical terms, subject classification terms,
scientific classifications
• Standard Index Collections
• Open Access scientific articles
• access to 204,582,649 – CORE.uk
• Patents
• EP full-text data for text analytics
• Medical collections
• COVID-19 Open Research Dataset
Task Complexity

How indices and ontologies are generated
• Technology and data flow overview
• Language and Domain Complexity

From Text to Structured Semantic Representation
of Patents and Scientific Publications

Any Data
collections
NLP pipeline, PoS & Chunker,
Lexico-Semantic Patterns
Harmonization &
Text Normalization
External
Language
Resources,
Gazetteers
Ontology
(OWL, JSON)
Ontology & Index Repositories
Meta Data
External Information
Resources
API
+
output
input
+
Search options
Graph Search
Artificial Researcher NLP-Toolkit
Word Similarity Model Context Similarity Model Task Model
1 2
3
4
5
6
Documents are processed through several text analysis modules:
1. Text Normalization and harmonization (including PDF converter)
2. NLP and relation identification
3. Artificial Researcher NLP-Toolkit - Technical Terms
4. Packaged as enhanced XML/JSON
5. Software AR Ontology Services and AR Passage Retrieval
6. End user applications rest APIs, desktop script, GUI
Cloud platforms
(Google, AZURE2023)
Internal
Information
Resources (analyses
are processed with
SaaS – so no data
are stored the
cloud)
Artificial Research Data Pipeline solution
Assembly line for creating Indices and Ontologies

Combining NLP & Distributional
Semantics
𝐽𝑜𝑖𝑛𝑒𝑑𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 = ෍
𝑖,𝑗=1,𝑛
𝑖≠𝑗
𝑖<𝑗
𝑁 cos
𝑤𝑖 , 𝑤𝑗
𝑁
• wi, wj represent each word vector pair cosine similarity of a MWT
• N is the number of words for a MWT (Andersson 2013-2017)
Embedding identifies similarities between different words, and the standard reference
threshold for w2v 0.60
• Underwear similar to underpants , undergarment, panties, underclothes
• Strength similar to strengths, strength, toughness, stronger, sfrength
Technical semantic relations are a mixture of single words and
phrases
• synthetic fibers synonym to polyester fibers
• thrips hypernym to bulb fly larvae
Language and Domain Complexity

What is Technical Term?
Depends on who you are asking!

BERT – Technical Term Prediction
• Task: detect domain term spans in sentences
A network synchronization method allows
Domain Term
O I I I O
• Using pre-trained SciBERT Model
• With additional training data for Technical Term detection
• Randomly picked 22 patents with different International Patent Classification codes
• Sentences 10,337, Technical Term dictionary of size 5,099
Fink T., Andersson L., Hanbury A. (2021) Detecting Multi Word Terms in patents the same way as entities / World Patent Information, 67 102078; 1 - 6

Technical term identification
- Evaluation using WIPO PEARL Terminology DB
RECALL AND PRECISION ASSESSMENT
WIPO CONCEPT RECALL 81%
ALL DOMAIN TERM RECALL 96%
ALL DOMAIN TERM PRECISION 85%

What is Semantic Lexical Relations
Hyponym, a narrow term
e.g. horse is a hyponym of farm animals
Hypernym, a broad term,
e.g. farm animals a hypernym of pig, cow, sheep, horse
Hyponymy relations:
• part of
• kind of
• member of
• type of
• domain specific relation
…. farm animals such as pigs, sheep, cows and horses
Language and Domain complexity

Better than Distributional Semantics
brake pedal:
vehicle operating pedal,
conventional hydraulic brake system
pedal devices
position brake actuating member
brake actuating member
hydraulically-assisted rack pinion steering
gear
brake operating member
conventional braking system
pair pedals
accelerator pedal
case pedal device
pedal device
Related concept
Artificial Researcher’s text mining technology Extraction using pre-trained contextual
models only
brake pedal PatBERT SciBERT
conventional hydraulic brake
system
0.69 0.89
hydraulically-assisted rack
pinion steering gear
0.49 0.80
conventional braking system 0.66 0.84
Technical terms
Using semantically enriched data can retrieve up to 60% more related concepts than traditional pre-trained
contextual models (a.k.a. Neural Network-based methods, BERT-family).

Proof-of-concept
-Passage retrieval performance on CLEF-IP 2013 text collection.
Method
PRES@
100
Recall@
100
MAP@
100
Prec(D)
(Albarede et. al. 2021) 0.461 0.603 0.173 0.231
(Andersson et. al., 2016) 0.444 0.560 0.187 0.282
(Andersson et. al., 2017) 0.558 0.647 0.269 0.207
(Luo J and Yang H. 2013) 0.433 0.540 0.191 0.213
Artificial Researcher Passage Retrieval ServiceTM
Artificial Researcher Graph Search ServiceTM
Albarede L., & Mulhem P., Goeuriot L., Le Pape-Gardeux C., Sylvain M., Trindad C. (2021). Passage retrieval in context: Experiments on Patents.
Andersson L, Lupu M, Palotti J., Hanbury A., Rauber A. (2016) When is the time Ripe for Natural Language Processing for Patent Passage Retrieval? In Proceedings of the 25th ACM International Conference
on Conference on Information and Knowledge Management (CIKM 2016)
Andersson L, Rekabsaz N., Hanbury A. (2017) Automatic query expansion for patent passage retrieval using paradigmatic and syntagmatic information The first WiNLP Workshop co-located with the Annual
Meeting of the Association for Computational Linguistics (ACL 2017), Vancouver
Luo J and Yang H. (2013). Query formulation for prior art search-Georgetown university at clef-ip 2013. In Proc. of CLEF.

New REST API service
Text classification and semantic relation extraction
Output
Automatic classification according to
taxonomies such as IPC/CPC, MeSH,
PLoS, WIPO Scientific Subject fields
Input

Artificial Researcher - Services & Products
Test access: API key b10225791d2a478c9ff3d8cd106ad65a
• Artificial Researcher Passage Retrieval Service
• Access: rest API, Desktop App Script (Python)
• https://swagger.artificialresearcher.com/ (developer)
• https://passageretrieval.artificialresearcher.com/ (demo)
• SaaS or on-premises software (Docker)
• Artificial Researcher Ontology Service
• Access: rest API, Desktop App Script (Python)
• https://swagger.artificialresearcher.com/?urls.primaryName=onto-api (developer)
• Data format OWL or JSON
• Ontology generator – only SaaS
• Artificial Researcher NLP-toolkit Services
• Access: rest API
• https://swagger.artificialresearcher.com/?urls.primaryName=unified-nlp-server (developer)
• Artificial Researcher Graph Search Service
• UX: https://graph-demo.artificialresearcher.com/ (demo)

Thank you for your attention
artificialresearcher.com
linda.andersson@artificialresearcher.com

Domain Knowledge Powers AI with Context

Recommandé

Recommandé

Contenu connexe

Similaire à Domain Knowledge Powers AI with Context

Similaire à Domain Knowledge Powers AI with Context (20)

Plus de Dr. Haxel Consult

Plus de Dr. Haxel Consult (20)

Dernier

Dernier (20)

Domain Knowledge Powers AI with Context