SlideShare une entreprise Scribd logo

AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Andersson (Artificial Researcher IT GmbH, Austria)

In the patent domain, all types of issues, from very specific search requirements to the linguistic characteristics of the text domain, are accentuated. Consequently, to develop patent text mining tools for scientists and patent experts, we need to understand their daily work tasks, as well as the linguistic character of the text genre (i.e., patentese). Patent text is a mixture of legal and domain-specific terms. In processing technical English texts, a multi-word unit method is often deployed as a word-formation strategy to expand the working vocabulary, i.e., introducing a new concept without the invention of an entirely new word. This productive word formation is a well-known challenge for traditional natural language processing tools utilizing supervised machine learning algorithms due to limited domain-specific training data. Deep learning technologies have been introduced to overcome the reduction in performance of traditional NLP tools. In the Artificial Researcher technologies, we have integrated explicit and implicit linguistic knowledge into the deep learning algorithms, essential for domain-specific text mining tools. In this talk, we will present a step-by-step process of how we have developed the mentioned text mining tools. For the final outline, we will also demonstrate how these tools can be integrated in a cross-genre passage retrieval system, based on a technology from 2016 that still holds the state-of-the-art within the patent text mining research community in 2022.

1  sur  23
Télécharger pour lire hors ligne
Domain Knowledge makes Artificial
Intelligence Smart
Linda Andersson
AI-SDV 2022
© Artificial Researcher,2022 1
Linda Andersson
CEO/CTO Artificial Researcher IT GmbH
Awards
• 2018 Commercial Viability award by
the Austrian Angel Investors
Association.
• 2017 PhD high potential R&D award,
i2c Award, Vienna University of
Technology
• 1999 Finalist in a Venture Cup held at
Mälardalens University College.
Research fields
• Natural Language Understanding
• Natural Language Processing
• Information Extraction
• Information Retrieval
Academic merits
• Information design
1998-2001
• General Linguistics
2001-2003
• Computational Linguistics
2006-2009
• Language Engineering
2004
• Library & Information Science
2004-2006
• Computer Science
2009-2019
© Artificial Researcher,2022 2
“There is little you can do about prior art which
wasn’t found in the search that you made before
filing the patent application.”
Barry Franks Hynell, “Opposition before the EPO”, preface to
Winning with IP: managing intellectual property today, ISBN 978-1-9998329-6-4
© Artificial Researcher,2022 3
Information Need ”Prior Art”
No 37435 Benz Patent – Motorwagen (1886 )
Steering mechanism
Search for:
• Problem and Solution (A)
• Scope of the invention (A,B,C)
• Specific technical details (B,C)
Engine function
A
C
C
The transport vehicle
B
© Artificial Researcher,2022 4
What make a AI Tool technical successful?
• A successful text mining solution does not only focus on developing the
technology or the best deep learning algorithm, it is much more complex.
The technology design reflects:
• Task Complexity
• Information needs, retrieve relevant paragraphs and not just documents
• Language Complexity
• Word formation of new words are particular important for the patent text
genre.
• Domain text-genre Complexity
• Multi word terms
© Artificial Researcher,2022 5
Known item search Explorative search
Find relevant paragraphs
Scientific Summarization
ComparativeSummarization
Requests
Output example
Reading text
Different Information Needs
Task Complexity
© Artificial Researcher,2022 6
Publicité

Recommandé

FIWARE Global Summit - IDS Implementation with FIWARE Software Components
FIWARE Global Summit - IDS Implementation with FIWARE Software ComponentsFIWARE Global Summit - IDS Implementation with FIWARE Software Components
FIWARE Global Summit - IDS Implementation with FIWARE Software ComponentsFIWARE
 
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The ServicesLynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The ServicesLynx Project
 
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...Enrico Daga
 
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in DataverseClariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataversevty
 
The Dendro research data management platform: Applying ontologies to long-ter...
The Dendro research data management platform: Applying ontologies to long-ter...The Dendro research data management platform: Applying ontologies to long-ter...
The Dendro research data management platform: Applying ontologies to long-ter...João Rocha da Silva
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021Gérard Dupont
 

Contenu connexe

Similaire à AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Andersson (Artificial Researcher IT GmbH, Austria)

SFScon21 - Simone Tritini - The Environmental Data Platform web portal
SFScon21 - Simone Tritini - The Environmental Data Platform web portalSFScon21 - Simone Tritini - The Environmental Data Platform web portal
SFScon21 - Simone Tritini - The Environmental Data Platform web portalSouth Tyrol Free Software Conference
 
Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...Gautier Poupeau
 
Semantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaSemantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaGiorgia Lodi
 
[ENCORE webinar] Artificial Intelligence for mapping skills of the future
[ENCORE webinar] Artificial Intelligence for mapping skills of the future[ENCORE webinar] Artificial Intelligence for mapping skills of the future
[ENCORE webinar] Artificial Intelligence for mapping skills of the futureEADTU
 
Artificial Intelligence and Human Expertise to Foresee Green, Digital and Ent...
Artificial Intelligence and Human Expertise to Foresee Green, Digital and Ent...Artificial Intelligence and Human Expertise to Foresee Green, Digital and Ent...
Artificial Intelligence and Human Expertise to Foresee Green, Digital and Ent...EADTU
 
SiddharthaMitra_resume_pdf
SiddharthaMitra_resume_pdfSiddharthaMitra_resume_pdf
SiddharthaMitra_resume_pdfSiddhartha Mitra
 
8th International Conference on Natural Language Computing (NATL 2022)
8th International Conference on Natural Language Computing (NATL 2022)8th International Conference on Natural Language Computing (NATL 2022)
8th International Conference on Natural Language Computing (NATL 2022)ijcsity
 
Call for Papers - 8th International Conference on Natural Language Computing ...
Call for Papers - 8th International Conference on Natural Language Computing ...Call for Papers - 8th International Conference on Natural Language Computing ...
Call for Papers - 8th International Conference on Natural Language Computing ...ijistjournal
 
Call for Paper - 3rd International Conference on NLP & Information Retrieval ...
Call for Paper - 3rd International Conference on NLP & Information Retrieval ...Call for Paper - 3rd International Conference on NLP & Information Retrieval ...
Call for Paper - 3rd International Conference on NLP & Information Retrieval ...mlaij
 
Béatrice Markhoff - Semantic mediation ArSol and CIDOC CRM
Béatrice Markhoff - Semantic mediation ArSol and CIDOC CRMBéatrice Markhoff - Semantic mediation ArSol and CIDOC CRM
Béatrice Markhoff - Semantic mediation ArSol and CIDOC CRMariadnenetwork
 
2019 IRIS-HEP AS workshop: Particles and decays
2019 IRIS-HEP AS workshop: Particles and decays2019 IRIS-HEP AS workshop: Particles and decays
2019 IRIS-HEP AS workshop: Particles and decaysHenry Schreiner
 
3rd International Conference on NLP & Information Retrieval (NLPI 2022)
3rd International Conference on NLP & Information Retrieval (NLPI 2022)3rd International Conference on NLP & Information Retrieval (NLPI 2022)
3rd International Conference on NLP & Information Retrieval (NLPI 2022)ijccsa
 
8 th International Conference on Natural Language Computing (NATL 2022)
8 th International Conference on Natural Language Computing (NATL 2022)8 th International Conference on Natural Language Computing (NATL 2022)
8 th International Conference on Natural Language Computing (NATL 2022)ijscai
 
Smart cities no ai without ia
Smart cities   no ai without iaSmart cities   no ai without ia
Smart cities no ai without iaFredric Landqvist
 
TEAMS 6, 7 and 8
TEAMS 6, 7 and 8TEAMS 6, 7 and 8
TEAMS 6, 7 and 8plan4all
 
Geospatial Metadata and Spatial Data: It's all Greek to me!
Geospatial Metadata and Spatial Data: It's all Greek to me!Geospatial Metadata and Spatial Data: It's all Greek to me!
Geospatial Metadata and Spatial Data: It's all Greek to me!EDINA, University of Edinburgh
 
Call for Papers - 8th International Conference on Natural Language Computing ...
Call for Papers - 8th International Conference on Natural Language Computing ...Call for Papers - 8th International Conference on Natural Language Computing ...
Call for Papers - 8th International Conference on Natural Language Computing ...IJNSA Journal
 
NATL 2022 CFP (14).pdf
NATL 2022 CFP (14).pdfNATL 2022 CFP (14).pdf
NATL 2022 CFP (14).pdfijcsity
 

Similaire à AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Andersson (Artificial Researcher IT GmbH, Austria) (20)

SFScon21 - Simone Tritini - The Environmental Data Platform web portal
SFScon21 - Simone Tritini - The Environmental Data Platform web portalSFScon21 - Simone Tritini - The Environmental Data Platform web portal
SFScon21 - Simone Tritini - The Environmental Data Platform web portal
 
Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...
 
Semantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaSemantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenza
 
[ENCORE webinar] Artificial Intelligence for mapping skills of the future
[ENCORE webinar] Artificial Intelligence for mapping skills of the future[ENCORE webinar] Artificial Intelligence for mapping skills of the future
[ENCORE webinar] Artificial Intelligence for mapping skills of the future
 
Artificial Intelligence and Human Expertise to Foresee Green, Digital and Ent...
Artificial Intelligence and Human Expertise to Foresee Green, Digital and Ent...Artificial Intelligence and Human Expertise to Foresee Green, Digital and Ent...
Artificial Intelligence and Human Expertise to Foresee Green, Digital and Ent...
 
SiddharthaMitra_resume_pdf
SiddharthaMitra_resume_pdfSiddharthaMitra_resume_pdf
SiddharthaMitra_resume_pdf
 
8th International Conference on Natural Language Computing (NATL 2022)
8th International Conference on Natural Language Computing (NATL 2022)8th International Conference on Natural Language Computing (NATL 2022)
8th International Conference on Natural Language Computing (NATL 2022)
 
Call for Papers - 8th International Conference on Natural Language Computing ...
Call for Papers - 8th International Conference on Natural Language Computing ...Call for Papers - 8th International Conference on Natural Language Computing ...
Call for Papers - 8th International Conference on Natural Language Computing ...
 
Call for Paper - 3rd International Conference on NLP & Information Retrieval ...
Call for Paper - 3rd International Conference on NLP & Information Retrieval ...Call for Paper - 3rd International Conference on NLP & Information Retrieval ...
Call for Paper - 3rd International Conference on NLP & Information Retrieval ...
 
Béatrice Markhoff - Semantic mediation ArSol and CIDOC CRM
Béatrice Markhoff - Semantic mediation ArSol and CIDOC CRMBéatrice Markhoff - Semantic mediation ArSol and CIDOC CRM
Béatrice Markhoff - Semantic mediation ArSol and CIDOC CRM
 
2019 IRIS-HEP AS workshop: Particles and decays
2019 IRIS-HEP AS workshop: Particles and decays2019 IRIS-HEP AS workshop: Particles and decays
2019 IRIS-HEP AS workshop: Particles and decays
 
3rd International Conference on NLP & Information Retrieval (NLPI 2022)
3rd International Conference on NLP & Information Retrieval (NLPI 2022)3rd International Conference on NLP & Information Retrieval (NLPI 2022)
3rd International Conference on NLP & Information Retrieval (NLPI 2022)
 
8 th International Conference on Natural Language Computing (NATL 2022)
8 th International Conference on Natural Language Computing (NATL 2022)8 th International Conference on Natural Language Computing (NATL 2022)
8 th International Conference on Natural Language Computing (NATL 2022)
 
Resume_Karthick
Resume_KarthickResume_Karthick
Resume_Karthick
 
Resume (2)
Resume (2)Resume (2)
Resume (2)
 
Smart cities no ai without ia
Smart cities   no ai without iaSmart cities   no ai without ia
Smart cities no ai without ia
 
TEAMS 6, 7 and 8
TEAMS 6, 7 and 8TEAMS 6, 7 and 8
TEAMS 6, 7 and 8
 
Geospatial Metadata and Spatial Data: It's all Greek to me!
Geospatial Metadata and Spatial Data: It's all Greek to me!Geospatial Metadata and Spatial Data: It's all Greek to me!
Geospatial Metadata and Spatial Data: It's all Greek to me!
 
Call for Papers - 8th International Conference on Natural Language Computing ...
Call for Papers - 8th International Conference on Natural Language Computing ...Call for Papers - 8th International Conference on Natural Language Computing ...
Call for Papers - 8th International Conference on Natural Language Computing ...
 
NATL 2022 CFP (14).pdf
NATL 2022 CFP (14).pdfNATL 2022 CFP (14).pdf
NATL 2022 CFP (14).pdf
 

Plus de Dr. Haxel Consult

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementDr. Haxel Consult
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...Dr. Haxel Consult
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...Dr. Haxel Consult
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...Dr. Haxel Consult
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...Dr. Haxel Consult
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...Dr. Haxel Consult
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...Dr. Haxel Consult
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...Dr. Haxel Consult
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...Dr. Haxel Consult
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...Dr. Haxel Consult
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterDr. Haxel Consult
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCDr. Haxel Consult
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...Dr. Haxel Consult
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...Dr. Haxel Consult
 
The Artificial Intelligence Conference on Search, Data and Text Mining, Analy...
The Artificial Intelligence Conference on Search, Data and Text Mining, Analy...The Artificial Intelligence Conference on Search, Data and Text Mining, Analy...
The Artificial Intelligence Conference on Search, Data and Text Mining, Analy...Dr. Haxel Consult
 
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...Dr. Haxel Consult
 

Plus de Dr. Haxel Consult (20)

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance Center
 
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IPAI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IP
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOC
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
 
The Artificial Intelligence Conference on Search, Data and Text Mining, Analy...
The Artificial Intelligence Conference on Search, Data and Text Mining, Analy...The Artificial Intelligence Conference on Search, Data and Text Mining, Analy...
The Artificial Intelligence Conference on Search, Data and Text Mining, Analy...
 
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...
 

Dernier

Augmented and Mixed Reality Solutions for Frontline Medical Professionals
Augmented and Mixed Reality Solutions for Frontline Medical ProfessionalsAugmented and Mixed Reality Solutions for Frontline Medical Professionals
Augmented and Mixed Reality Solutions for Frontline Medical Professionalsthirdeyegen65
 
Augmented and Mixed Reality Solutions for Aerospace & Defense
Augmented and Mixed Reality Solutions for Aerospace & DefenseAugmented and Mixed Reality Solutions for Aerospace & Defense
Augmented and Mixed Reality Solutions for Aerospace & Defensethirdeyegen65
 
Obstructive jaundice is a medical condition characterized by the yellowing of...
Obstructive jaundice is a medical condition characterized by the yellowing of...Obstructive jaundice is a medical condition characterized by the yellowing of...
Obstructive jaundice is a medical condition characterized by the yellowing of...ssuser7b7f4e
 
UGB INTERNETBANKING FACILITY LAUNCHED.pptx
UGB INTERNETBANKING FACILITY LAUNCHED.pptxUGB INTERNETBANKING FACILITY LAUNCHED.pptx
UGB INTERNETBANKING FACILITY LAUNCHED.pptxRitesh Sahu
 
Biometrics Technology Intresting PPT
Biometrics Technology Intresting PPTBiometrics Technology Intresting PPT
Biometrics Technology Intresting PPTPraveenKumarThota7
 
Regulation is Coming - Trusted Media Summit 2023
Regulation is Coming - Trusted Media Summit 2023Regulation is Coming - Trusted Media Summit 2023
Regulation is Coming - Trusted Media Summit 2023Damar Juniarto
 
Red shadows ringing in Japan's Cyberspace
Red shadows ringing in Japan's CyberspaceRed shadows ringing in Japan's Cyberspace
Red shadows ringing in Japan's Cyberspacesttyk
 
Model Jaringan network jaringan komputer.pdf
Model Jaringan network jaringan komputer.pdfModel Jaringan network jaringan komputer.pdf
Model Jaringan network jaringan komputer.pdfgalfinprihardiputra0
 
AWS Overview of AWS Clarify, Feature Store, Hyper parameter Tuning
AWS Overview of AWS  Clarify, Feature Store, Hyper parameter TuningAWS Overview of AWS  Clarify, Feature Store, Hyper parameter Tuning
AWS Overview of AWS Clarify, Feature Store, Hyper parameter TuningVarun Garg
 

Dernier (9)

Augmented and Mixed Reality Solutions for Frontline Medical Professionals
Augmented and Mixed Reality Solutions for Frontline Medical ProfessionalsAugmented and Mixed Reality Solutions for Frontline Medical Professionals
Augmented and Mixed Reality Solutions for Frontline Medical Professionals
 
Augmented and Mixed Reality Solutions for Aerospace & Defense
Augmented and Mixed Reality Solutions for Aerospace & DefenseAugmented and Mixed Reality Solutions for Aerospace & Defense
Augmented and Mixed Reality Solutions for Aerospace & Defense
 
Obstructive jaundice is a medical condition characterized by the yellowing of...
Obstructive jaundice is a medical condition characterized by the yellowing of...Obstructive jaundice is a medical condition characterized by the yellowing of...
Obstructive jaundice is a medical condition characterized by the yellowing of...
 
UGB INTERNETBANKING FACILITY LAUNCHED.pptx
UGB INTERNETBANKING FACILITY LAUNCHED.pptxUGB INTERNETBANKING FACILITY LAUNCHED.pptx
UGB INTERNETBANKING FACILITY LAUNCHED.pptx
 
Biometrics Technology Intresting PPT
Biometrics Technology Intresting PPTBiometrics Technology Intresting PPT
Biometrics Technology Intresting PPT
 
Regulation is Coming - Trusted Media Summit 2023
Regulation is Coming - Trusted Media Summit 2023Regulation is Coming - Trusted Media Summit 2023
Regulation is Coming - Trusted Media Summit 2023
 
Red shadows ringing in Japan's Cyberspace
Red shadows ringing in Japan's CyberspaceRed shadows ringing in Japan's Cyberspace
Red shadows ringing in Japan's Cyberspace
 
Model Jaringan network jaringan komputer.pdf
Model Jaringan network jaringan komputer.pdfModel Jaringan network jaringan komputer.pdf
Model Jaringan network jaringan komputer.pdf
 
AWS Overview of AWS Clarify, Feature Store, Hyper parameter Tuning
AWS Overview of AWS  Clarify, Feature Store, Hyper parameter TuningAWS Overview of AWS  Clarify, Feature Store, Hyper parameter Tuning
AWS Overview of AWS Clarify, Feature Store, Hyper parameter Tuning
 

AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Andersson (Artificial Researcher IT GmbH, Austria)

  • 1. Domain Knowledge makes Artificial Intelligence Smart Linda Andersson AI-SDV 2022 © Artificial Researcher,2022 1
  • 2. Linda Andersson CEO/CTO Artificial Researcher IT GmbH Awards • 2018 Commercial Viability award by the Austrian Angel Investors Association. • 2017 PhD high potential R&D award, i2c Award, Vienna University of Technology • 1999 Finalist in a Venture Cup held at Mälardalens University College. Research fields • Natural Language Understanding • Natural Language Processing • Information Extraction • Information Retrieval Academic merits • Information design 1998-2001 • General Linguistics 2001-2003 • Computational Linguistics 2006-2009 • Language Engineering 2004 • Library & Information Science 2004-2006 • Computer Science 2009-2019 © Artificial Researcher,2022 2
  • 3. “There is little you can do about prior art which wasn’t found in the search that you made before filing the patent application.” Barry Franks Hynell, “Opposition before the EPO”, preface to Winning with IP: managing intellectual property today, ISBN 978-1-9998329-6-4 © Artificial Researcher,2022 3
  • 4. Information Need ”Prior Art” No 37435 Benz Patent – Motorwagen (1886 ) Steering mechanism Search for: • Problem and Solution (A) • Scope of the invention (A,B,C) • Specific technical details (B,C) Engine function A C C The transport vehicle B © Artificial Researcher,2022 4
  • 5. What make a AI Tool technical successful? • A successful text mining solution does not only focus on developing the technology or the best deep learning algorithm, it is much more complex. The technology design reflects: • Task Complexity • Information needs, retrieve relevant paragraphs and not just documents • Language Complexity • Word formation of new words are particular important for the patent text genre. • Domain text-genre Complexity • Multi word terms © Artificial Researcher,2022 5
  • 6. Known item search Explorative search Find relevant paragraphs Scientific Summarization ComparativeSummarization Requests Output example Reading text Different Information Needs Task Complexity © Artificial Researcher,2022 6
  • 7. Next generation Search Technology Our search software generate the query in the specific domain Retrieves relevant paragraphs Graph Search & Text Box Search We have developed text mining technologies by combining traditional Natural Language Processing and Deep Learning technologies. © Artificial Researcher,2022 7 Task Complexity
  • 8. Example of automatic query generation Automatic query expansion terms © Artificial Researcher,2022 8 Automatic Query formulation - From text segment to automatic Boolean query Task Complexity
  • 9. Automatic Query formulation - From terms to concept term graphs https://graph-demo.artificialresearcher.com/ Term: environment climate fuel biofuel (fuel OR gas OR engine OR vehicle OR oil OR coal OR energy OR hydrocarbon OR steam OR liquid OR fluid OR furnace OR lubricant OR hydrogen OR feed) AND ("diesel fuel"~4 OR "liquid fuel"~4 OR "fuel gas"~4 OR "fuel oil"~4 OR "fuel material"~4 OR "hydrogen gas"~4 OR "diesel oil"~4 OR "liquid hydrocarbon"~4 OR "burning gas"~4 OR "synthetic gas"~4 OR "suitable hydrocarbon"~4 OR "mixed gas"~4 OR "tail gas"~4) With context similarity of 0.85 © Artificial Researcher,2022 9 Task Complexity
  • 10. Artificial Researcher-Search Solutions Relation graph for terms extracted from patent and scientific publications • Direct access • Relevant paragraphs (passages) • Abstracts • Bibliographic information • Links to full text documents • Download • Paragraphs including all bibliographic data, including identified technical terms, subject classification terms, scientific classifications • Standard Index Collections • Open Access scientific articles • access to 204,582,649 – CORE.uk • Patents • EP full-text data for text analytics • Medical collections • COVID-19 Open Research Dataset © Artificial Researcher,2022 10 Task Complexity
  • 11. How indices and ontologies are generated • Technology and data flow overview • Language and Domain Complexity © Artificial Researcher,2022 11
  • 12. From Text to Structured Semantic Representation of Patents and Scientific Publications © Artificial Researcher,2022 12
  • 13. Any Data collections NLP pipeline, PoS & Chunker, Lexico-Semantic Patterns Harmonization & Text Normalization External Language Resources, Gazetteers Ontology (OWL, JSON) Ontology & Index Repositories Meta Data External Information Resources API + output input + Search options Graph Search Artificial Researcher NLP-Toolkit Word Similarity Model Context Similarity Model Task Model 1 2 3 4 5 6 Documents are processed through several text analysis modules: 1. Text Normalization and harmonization (including PDF converter) 2. NLP and relation identification 3. Artificial Researcher NLP-Toolkit - Technical Terms 4. Packaged as enhanced XML/JSON 5. Software AR Ontology Services and AR Passage Retrieval 6. End user applications rest APIs, desktop script, GUI Cloud platforms (Google, AZURE2023) Internal Information Resources (analyses are processed with SaaS – so no data are stored the cloud) Artificial Research Data Pipeline solution Assembly line for creating Indices and Ontologies © Artificial Researcher,2022 13
  • 14. Combining NLP & Distributional Semantics 𝐽𝑜𝑖𝑛𝑒𝑑𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 = ෍ 𝑖,𝑗=1,𝑛 𝑖≠𝑗 𝑖<𝑗 𝑁 cos 𝑤𝑖 , 𝑤𝑗 𝑁 • wi, wj represent each word vector pair cosine similarity of a MWT • N is the number of words for a MWT (Andersson 2013-2017) Embedding identifies similarities between different words, and the standard reference threshold for w2v 0.60 • Underwear similar to underpants , undergarment, panties, underclothes • Strength similar to strengths, strength, toughness, stronger, sfrength Technical semantic relations are a mixture of single words and phrases • synthetic fibers synonym to polyester fibers • thrips hypernym to bulb fly larvae © Artificial Researcher,2022 14 Language and Domain Complexity
  • 15. What is Technical Term? Depends on who you are asking! © Artificial Researcher,2022 15 Language and Domain Complexity
  • 16. BERT – Technical Term Prediction • Task: detect domain term spans in sentences A network synchronization method allows Domain Term O I I I O • Using pre-trained SciBERT Model • With additional training data for Technical Term detection • Randomly picked 22 patents with different International Patent Classification codes • Sentences 10,337, Technical Term dictionary of size 5,099 Fink T., Andersson L., Hanbury A. (2021) Detecting Multi Word Terms in patents the same way as entities / World Patent Information, 67 102078; 1 - 6 © Artificial Researcher,2022 16 Language and Domain Complexity
  • 17. Technical term identification - Evaluation using WIPO PEARL Terminology DB RECALL AND PRECISION ASSESSMENT WIPO CONCEPT RECALL 81% ALL DOMAIN TERM RECALL 96% ALL DOMAIN TERM PRECISION 85% © Artificial Researcher,2022 17 Language and Domain Complexity
  • 18. What is Semantic Lexical Relations Hyponym, a narrow term e.g. horse is a hyponym of farm animals Hypernym, a broad term, e.g. farm animals a hypernym of pig, cow, sheep, horse Hyponymy relations: • part of • kind of • member of • type of • domain specific relation …. farm animals such as pigs, sheep, cows and horses © Artificial Researcher,2022 18 Language and Domain complexity
  • 19. Better than Distributional Semantics brake pedal: vehicle operating pedal, conventional hydraulic brake system pedal devices position brake actuating member brake actuating member hydraulically-assisted rack pinion steering gear brake operating member conventional braking system pair pedals accelerator pedal case pedal device pedal device Related concept Artificial Researcher’s text mining technology Extraction using pre-trained contextual models only brake pedal PatBERT SciBERT conventional hydraulic brake system 0.69 0.89 hydraulically-assisted rack pinion steering gear 0.49 0.80 conventional braking system 0.66 0.84 Technical terms Using semantically enriched data can retrieve up to 60% more related concepts than traditional pre-trained contextual models (a.k.a. Neural Network-based methods, BERT-family). © Artificial Researcher,2022 19 Language and Domain complexity
  • 20. Proof-of-concept -Passage retrieval performance on CLEF-IP 2013 text collection. Method PRES@ 100 Recall@ 100 MAP@ 100 Prec(D) (Albarede et. al. 2021) 0.461 0.603 0.173 0.231 (Andersson et. al., 2016) 0.444 0.560 0.187 0.282 (Andersson et. al., 2017) 0.558 0.647 0.269 0.207 (Luo J and Yang H. 2013) 0.433 0.540 0.191 0.213 Artificial Researcher Passage Retrieval ServiceTM Artificial Researcher Graph Search ServiceTM Albarede L., & Mulhem P., Goeuriot L., Le Pape-Gardeux C., Sylvain M., Trindad C. (2021). Passage retrieval in context: Experiments on Patents. Andersson L, Lupu M, Palotti J., Hanbury A., Rauber A. (2016) When is the time Ripe for Natural Language Processing for Patent Passage Retrieval? In Proceedings of the 25th ACM International Conference on Conference on Information and Knowledge Management (CIKM 2016) Andersson L, Rekabsaz N., Hanbury A. (2017) Automatic query expansion for patent passage retrieval using paradigmatic and syntagmatic information The first WiNLP Workshop co-located with the Annual Meeting of the Association for Computational Linguistics (ACL 2017), Vancouver Luo J and Yang H. (2013). Query formulation for prior art search-Georgetown university at clef-ip 2013. In Proc. of CLEF. © Artificial Researcher,2022 20 Language and Domain complexity
  • 21. New REST API service Text classification and semantic relation extraction Output Automatic classification according to taxonomies such as IPC/CPC, MeSH, PLoS, WIPO Scientific Subject fields Input © Artificial Researcher,2022 21
  • 22. Artificial Researcher - Services & Products Test access: API key b10225791d2a478c9ff3d8cd106ad65a • Artificial Researcher Passage Retrieval Service • Access: rest API, Desktop App Script (Python) • https://swagger.artificialresearcher.com/ (developer) • https://passageretrieval.artificialresearcher.com/ (demo) • SaaS or on-premises software (Docker) • Artificial Researcher Ontology Service • Access: rest API, Desktop App Script (Python) • https://swagger.artificialresearcher.com/?urls.primaryName=onto-api (developer) • Data format OWL or JSON • SaaS or on-premises software (Docker) • Ontology generator – only SaaS • Artificial Researcher NLP-toolkit Services • Access: rest API • https://swagger.artificialresearcher.com/?urls.primaryName=unified-nlp-server (developer) • SaaS or on-premises software (Docker) • Artificial Researcher Graph Search Service • UX: https://graph-demo.artificialresearcher.com/ (demo) © Artificial Researcher,2022 22
  • 23. Thank you for your attention artificialresearcher.com linda.andersson@artificialresearcher.com © Artificial Researcher,2022 23