SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
Bootstrapping Machine Learning using Knowledge Graphs
Semantic AI for
Named Entity Recognition
Sebastian Gabler
Director of Sales, Semantic Web Company
Robert David
CTO, Semantic Web Company
© Semantic Web Company 2020
Software Engineers &
Consultants for NLP,
Semantics and Machine
learning
Introducing Semantic Web Company
Founded in 2004
Based in Vienna
Privately held
Developer & Vendor of
PoolParty Semantic Suite
Participating in
projects with
€2.5 million
funding for R&D
SWC named to
KMWorld’s
‘100 Companies
That Matter in
Knowledge
Management’ in
2016, until 2019
60+ FTE
revenue growth/year
~25%
ISO 27001:2013
certified
2
© Semantic Web Company 2020
Raw data
Rules
Programs
Answers
Raw data
Answers
Machine
Learning
Raw data
Machine
Learning
Knowledge
Graph
Knowledge /
Answers
Explanations
Data
Programmatical
Approach
Machine
Learning
Knowledge
Graph
Basedon:KnowledgeGraphs-TheThirdEraof
ComputingbyDanMcCreary
1
2
3
Towards self-optimizing machines
3
Predictions
Rules
© Semantic Web Company 2020
Components and Features
4
© Semantic Web Company 2020
Make Sense - not Terms
5
Austria’s capital, lies in the country’s east on the Danube
River. Its artistic and intellectual legacy was shaped by
residents including Mozart, Beethoven and Sigmund Freud.
The city is also known for its Imperial palaces, including
Schönbrunn, the Habsburgs’ summer residence. In the
MuseumsQuartier district, historic and contemporary
buildings display works by Egon Schiele, Gustav Klimt and
other artists.
Schaut man vom Kahlenberg auf die Donau
hinunter, kann man Wien mit allen Sinnen spüren.
Weinberge sind da zu sehen, dahinter glänzt das
bauliche Erbe der mitteleuropäischen Metropole.
Ein halbes Jahrtausend wurde hier
Weltgeschichte geschrieben. Kunstgeschichte
sowieso.
© Semantic Web Company 2020
Extraction of related terms and concepts
6
Document
Corpus
▸ Websites
▸ PDF, Word, …
▸ Abstracts from
DBpedia
▸ RSS Feeds
Term 8
Term 3
Term 7
Term 8
Term 6
Term 9
Term 5
Term 10
▸ Relevant terms and phrases
▸ Relevancy of terms
▸ co-occurrence between terms and concepts
Term 1
Term 4
Term 2
© Semantic Web Company 2020
Using Graph Embeddings for Fast Named Entity
Disambiguation
Character-Based Neural Networks for NLP in the Real World
Existing Work in vicinity
7
© Semantic Web Company 2020
▸ Training an entity recognizer requires a large annotated corpus.
▸ Machine learning uses manually annotated training data:
▹ Available for common entity types like Persons, Organisations, Places
▹ Not available for many specific knowledge domains
▹ We need a domain expert to do the annotations
The NER Challenge
8
© Semantic Web Company 2020
Complementary Approach
▸ The PoolParty Extractor allows thesaurus-based entity recognition
▹ limited to the known vocabulary
▹ text context-independent
▸ Statistical NER models allows text-based entity recognition
▹ provide open-vocabulary entity recognition
▹ primarily based on the text context
9
© Semantic Web Company 2020
▸ Have a well-curated knowledge model: MesH, diseases branch
▸ Have training data: PubMed Corpus
▸ Have working data: Find diseases mentioned in newspapers
▸ Have a benchmark: BioBERT
▸ Trending topic: Coronavirus disease (COVID-19) outbreak
10
Use Case: Disease Annotator
© Semantic Web Company 2020 11
Disease Annotator Sample Application
© Semantic Web Company 2020 12
© Semantic Web Company 2020
▸ Semantic AI Approach:
▹ Automatically annotate training data
▹ Use a structured domain knowledge model
▹ Apply the model to a corpus
▹ Use the result annotations as training data
▹ Train the entity recognizer
Semantic AI Approach
13
© Semantic Web Company 2020
Evaluating the approach - Data
▸ Knowledge Model: Medical Subject Headings (MeSH)
▹ Controlled vocabulary for life sciences
▹ Evaluation: Diseases branch
▸ Training Corpus: PubMed Corpus (100k sentences)
▸ Ground Truth: Manually annotated NCBI Disease corpus (BioBERT)
14
© Semantic Web Company 2020
Evaluation: Corpus folding I
▸ Compare NER with Thesaurus annotations
▹ PubMed corpus @ 80% training data + 20% test data
▹ Full MeSH diseases branch as model
▸ Results:
▹ Precision: 95.21% Recall: 91.62% F1: 93.38%
15
© Semantic Web Company 2020
Evaluation: Corpus folding II
▸ Compare NER with ground truth:
▹ NCBI Disease corpus - manually annotated
▹ NCBI Disease corpus - automatically annotated
▸ Results:
▹ Precision: 82.15% Recall: 74.66% F1: 78.22%
■ Compare with BioBERT @ 89% (F1) - 11%
■ Compare with OpenNLP on NCBI training set + 5%
16
© Semantic Web Company 2020
Evaluation: Thesaurus folding
▸ Compare NER with Thesaurus annotations
▹ MESH diseases branch @ 80% training data+ 20% test data
▹ PubMed corpus
▸ Evaluations:
▹ Compare with full knowledge model Extractor annotations
▹ Compare with hidden part of the folds
17
© Semantic Web Company 2020
Semantic AI Benefits
▸ Bootstrapping path for Named Entity Recognizer
▹ Automated building of annotated training corpus
▹ Useful for agile recognizer training (app 10 minutes / cycle)
▹ No manual annotation efforts required
▸ Improvement on (vocabulary-based) Concept Extraction:
▹ High-quality thesaurus candidates
▹ Improved disambiguation
18
© Semantic Web Company 2020
▸ Establish feedback loop
▸ Provide generic connectors (Oracle ML)
▹ Leverage architecture benefits
▹ Provide integrated solution
▸ Run PoCs for similar use cases
▹ Fictional characters (Audiovisual)
▹ Emerging news topics (Broadcast)
▹ ...
19
Next steps...
© Semantic Web Company 2020
Impulse: Sebastian Gabler
Product Owner: Robert David
Project Management: Alexi Lopez-Lorca
NER bootstrapping process: Alexis Dimitriadis
NER process development: Sotiris
Karampatakis
Development: Guilherme Rodrigues &
Konstantin Dzekov
Demo Application: Juliane Pinero-Winkler &
Ali Marhubi
20
Credits
Nine Persons, 7 Nations,
1 company
© Semantic Web Company 2020
Thank you!
21
Q & A

Contenu connexe

Similaire à Ands 2020 - Disease Recognizer

The_Story_of_HavenOndemand_External
The_Story_of_HavenOndemand_ExternalThe_Story_of_HavenOndemand_External
The_Story_of_HavenOndemand_ExternalFernando Lucini
 
From the Trenches: Building Comprehensive and Secure Solutions in AWS
From the Trenches: Building Comprehensive and Secure Solutions in AWSFrom the Trenches: Building Comprehensive and Secure Solutions in AWS
From the Trenches: Building Comprehensive and Secure Solutions in AWSAlert Logic
 
The Story of HPE Haven OnDemand
The Story of HPE Haven OnDemandThe Story of HPE Haven OnDemand
The Story of HPE Haven OnDemandAlon Mei-raz
 
AI in Clinical Trials: From Big Sky to Practical Application
AI in Clinical Trials: From Big Sky to Practical ApplicationAI in Clinical Trials: From Big Sky to Practical Application
AI in Clinical Trials: From Big Sky to Practical ApplicationVeeva Systems
 
Nagios Monitoring Tool Tutorial | Server Monitoring with Nagios | DevOps Trai...
Nagios Monitoring Tool Tutorial | Server Monitoring with Nagios | DevOps Trai...Nagios Monitoring Tool Tutorial | Server Monitoring with Nagios | DevOps Trai...
Nagios Monitoring Tool Tutorial | Server Monitoring with Nagios | DevOps Trai...Edureka!
 
BIG Data & Hadoop Applications in Finance
BIG Data & Hadoop Applications in FinanceBIG Data & Hadoop Applications in Finance
BIG Data & Hadoop Applications in FinanceSkillspeed
 
IRJET - Fake News Detection using Machine Learning
IRJET -  	  Fake News Detection using Machine LearningIRJET -  	  Fake News Detection using Machine Learning
IRJET - Fake News Detection using Machine LearningIRJET Journal
 
CWIN17 london delivering devops and release automation in fs - duncan bradf...
CWIN17 london   delivering devops and release automation in fs - duncan bradf...CWIN17 london   delivering devops and release automation in fs - duncan bradf...
CWIN17 london delivering devops and release automation in fs - duncan bradf...Capgemini
 
Mphasis - Schwab Newsletter PDF - Sample 8707
Mphasis - Schwab Newsletter PDF - Sample 8707Mphasis - Schwab Newsletter PDF - Sample 8707
Mphasis - Schwab Newsletter PDF - Sample 8707harshan90
 
The Knowledge Graph Explosion
The Knowledge Graph ExplosionThe Knowledge Graph Explosion
The Knowledge Graph ExplosionNeo4j
 
Certus Accelerate - Building the business case for why you need to invest in ...
Certus Accelerate - Building the business case for why you need to invest in ...Certus Accelerate - Building the business case for why you need to invest in ...
Certus Accelerate - Building the business case for why you need to invest in ...Certus Solutions
 
Artificial intelligence capabilities overview yashowardhan sowale cwin18-india
Artificial intelligence capabilities overview yashowardhan sowale cwin18-indiaArtificial intelligence capabilities overview yashowardhan sowale cwin18-india
Artificial intelligence capabilities overview yashowardhan sowale cwin18-indiaCapgemini
 
Harish software engineer (rpa) 4+ yrs exp
Harish software engineer (rpa) 4+ yrs expHarish software engineer (rpa) 4+ yrs exp
Harish software engineer (rpa) 4+ yrs expHarish M
 
Emphasizing Value of Prioritizing AppSec Meetup 11052023.pptx
Emphasizing Value of Prioritizing AppSec Meetup 11052023.pptxEmphasizing Value of Prioritizing AppSec Meetup 11052023.pptx
Emphasizing Value of Prioritizing AppSec Meetup 11052023.pptxlior mazor
 
TrustRadius Conversion Rate Optimization Survey Results 2014
TrustRadius Conversion Rate Optimization Survey Results 2014TrustRadius Conversion Rate Optimization Survey Results 2014
TrustRadius Conversion Rate Optimization Survey Results 2014TrustRadius
 
How to use ai apps to unleash the power of your audit program
How to use ai apps to unleash the power of your audit program How to use ai apps to unleash the power of your audit program
How to use ai apps to unleash the power of your audit program Jim Kaplan CIA CFE
 
Adobe’s eCommerce Digital Transformation Journey
Adobe’s eCommerce Digital Transformation JourneyAdobe’s eCommerce Digital Transformation Journey
Adobe’s eCommerce Digital Transformation JourneyDynatrace
 
Modernising the Enterprise: An Evening with the AWS Enterprise User Group
Modernising the Enterprise: An Evening with the AWS Enterprise User GroupModernising the Enterprise: An Evening with the AWS Enterprise User Group
Modernising the Enterprise: An Evening with the AWS Enterprise User GroupHarley Young
 

Similaire à Ands 2020 - Disease Recognizer (20)

The_Story_of_HavenOndemand_External
The_Story_of_HavenOndemand_ExternalThe_Story_of_HavenOndemand_External
The_Story_of_HavenOndemand_External
 
From the Trenches: Building Comprehensive and Secure Solutions in AWS
From the Trenches: Building Comprehensive and Secure Solutions in AWSFrom the Trenches: Building Comprehensive and Secure Solutions in AWS
From the Trenches: Building Comprehensive and Secure Solutions in AWS
 
The Story of HPE Haven OnDemand
The Story of HPE Haven OnDemandThe Story of HPE Haven OnDemand
The Story of HPE Haven OnDemand
 
AI in Clinical Trials: From Big Sky to Practical Application
AI in Clinical Trials: From Big Sky to Practical ApplicationAI in Clinical Trials: From Big Sky to Practical Application
AI in Clinical Trials: From Big Sky to Practical Application
 
Heba Ismail CV
Heba Ismail CVHeba Ismail CV
Heba Ismail CV
 
Nagios Monitoring Tool Tutorial | Server Monitoring with Nagios | DevOps Trai...
Nagios Monitoring Tool Tutorial | Server Monitoring with Nagios | DevOps Trai...Nagios Monitoring Tool Tutorial | Server Monitoring with Nagios | DevOps Trai...
Nagios Monitoring Tool Tutorial | Server Monitoring with Nagios | DevOps Trai...
 
BIG Data & Hadoop Applications in Finance
BIG Data & Hadoop Applications in FinanceBIG Data & Hadoop Applications in Finance
BIG Data & Hadoop Applications in Finance
 
IRJET - Fake News Detection using Machine Learning
IRJET -  	  Fake News Detection using Machine LearningIRJET -  	  Fake News Detection using Machine Learning
IRJET - Fake News Detection using Machine Learning
 
CWIN17 london delivering devops and release automation in fs - duncan bradf...
CWIN17 london   delivering devops and release automation in fs - duncan bradf...CWIN17 london   delivering devops and release automation in fs - duncan bradf...
CWIN17 london delivering devops and release automation in fs - duncan bradf...
 
Mphasis - Schwab Newsletter PDF - Sample 8707
Mphasis - Schwab Newsletter PDF - Sample 8707Mphasis - Schwab Newsletter PDF - Sample 8707
Mphasis - Schwab Newsletter PDF - Sample 8707
 
The Knowledge Graph Explosion
The Knowledge Graph ExplosionThe Knowledge Graph Explosion
The Knowledge Graph Explosion
 
Certus Accelerate - Building the business case for why you need to invest in ...
Certus Accelerate - Building the business case for why you need to invest in ...Certus Accelerate - Building the business case for why you need to invest in ...
Certus Accelerate - Building the business case for why you need to invest in ...
 
Artificial intelligence capabilities overview yashowardhan sowale cwin18-india
Artificial intelligence capabilities overview yashowardhan sowale cwin18-indiaArtificial intelligence capabilities overview yashowardhan sowale cwin18-india
Artificial intelligence capabilities overview yashowardhan sowale cwin18-india
 
Harish software engineer (rpa) 4+ yrs exp
Harish software engineer (rpa) 4+ yrs expHarish software engineer (rpa) 4+ yrs exp
Harish software engineer (rpa) 4+ yrs exp
 
Emphasizing Value of Prioritizing AppSec Meetup 11052023.pptx
Emphasizing Value of Prioritizing AppSec Meetup 11052023.pptxEmphasizing Value of Prioritizing AppSec Meetup 11052023.pptx
Emphasizing Value of Prioritizing AppSec Meetup 11052023.pptx
 
TrustRadius Conversion Rate Optimization Survey Results 2014
TrustRadius Conversion Rate Optimization Survey Results 2014TrustRadius Conversion Rate Optimization Survey Results 2014
TrustRadius Conversion Rate Optimization Survey Results 2014
 
Case study slideshare
Case study   slideshareCase study   slideshare
Case study slideshare
 
How to use ai apps to unleash the power of your audit program
How to use ai apps to unleash the power of your audit program How to use ai apps to unleash the power of your audit program
How to use ai apps to unleash the power of your audit program
 
Adobe’s eCommerce Digital Transformation Journey
Adobe’s eCommerce Digital Transformation JourneyAdobe’s eCommerce Digital Transformation Journey
Adobe’s eCommerce Digital Transformation Journey
 
Modernising the Enterprise: An Evening with the AWS Enterprise User Group
Modernising the Enterprise: An Evening with the AWS Enterprise User GroupModernising the Enterprise: An Evening with the AWS Enterprise User Group
Modernising the Enterprise: An Evening with the AWS Enterprise User Group
 

Dernier

Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...mikehavy0
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Voces Mineras
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样wsppdmt
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Valters Lauzums
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格q6pzkpark
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...yulianti213969
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjadimosmejiaslendon
 
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单aqpto5bt
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationmuqadasqasim10
 
Solution manual for managerial accounting 8th edition by john wild ken shaw b...
Solution manual for managerial accounting 8th edition by john wild ken shaw b...Solution manual for managerial accounting 8th edition by john wild ken shaw b...
Solution manual for managerial accounting 8th edition by john wild ken shaw b...rightmanforbloodline
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxStephen266013
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token PredictionNABLAS株式会社
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives23050636
 
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive FutureFuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive FutureBoston Institute of Analytics
 
Rolex Watch - Design Decision Analysis.
Rolex Watch -  Design Decision Analysis.Rolex Watch -  Design Decision Analysis.
Rolex Watch - Design Decision Analysis.zeddstock
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...ThinkInnovation
 

Dernier (20)

Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic information
 
Solution manual for managerial accounting 8th edition by john wild ken shaw b...
Solution manual for managerial accounting 8th edition by john wild ken shaw b...Solution manual for managerial accounting 8th edition by john wild ken shaw b...
Solution manual for managerial accounting 8th edition by john wild ken shaw b...
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
 
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive FutureFuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
 
Rolex Watch - Design Decision Analysis.
Rolex Watch -  Design Decision Analysis.Rolex Watch -  Design Decision Analysis.
Rolex Watch - Design Decision Analysis.
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 

Ands 2020 - Disease Recognizer

  • 1. Bootstrapping Machine Learning using Knowledge Graphs Semantic AI for Named Entity Recognition Sebastian Gabler Director of Sales, Semantic Web Company Robert David CTO, Semantic Web Company
  • 2. © Semantic Web Company 2020 Software Engineers & Consultants for NLP, Semantics and Machine learning Introducing Semantic Web Company Founded in 2004 Based in Vienna Privately held Developer & Vendor of PoolParty Semantic Suite Participating in projects with €2.5 million funding for R&D SWC named to KMWorld’s ‘100 Companies That Matter in Knowledge Management’ in 2016, until 2019 60+ FTE revenue growth/year ~25% ISO 27001:2013 certified 2
  • 3. © Semantic Web Company 2020 Raw data Rules Programs Answers Raw data Answers Machine Learning Raw data Machine Learning Knowledge Graph Knowledge / Answers Explanations Data Programmatical Approach Machine Learning Knowledge Graph Basedon:KnowledgeGraphs-TheThirdEraof ComputingbyDanMcCreary 1 2 3 Towards self-optimizing machines 3 Predictions Rules
  • 4. © Semantic Web Company 2020 Components and Features 4
  • 5. © Semantic Web Company 2020 Make Sense - not Terms 5 Austria’s capital, lies in the country’s east on the Danube River. Its artistic and intellectual legacy was shaped by residents including Mozart, Beethoven and Sigmund Freud. The city is also known for its Imperial palaces, including Schönbrunn, the Habsburgs’ summer residence. In the MuseumsQuartier district, historic and contemporary buildings display works by Egon Schiele, Gustav Klimt and other artists. Schaut man vom Kahlenberg auf die Donau hinunter, kann man Wien mit allen Sinnen spüren. Weinberge sind da zu sehen, dahinter glänzt das bauliche Erbe der mitteleuropäischen Metropole. Ein halbes Jahrtausend wurde hier Weltgeschichte geschrieben. Kunstgeschichte sowieso.
  • 6. © Semantic Web Company 2020 Extraction of related terms and concepts 6 Document Corpus ▸ Websites ▸ PDF, Word, … ▸ Abstracts from DBpedia ▸ RSS Feeds Term 8 Term 3 Term 7 Term 8 Term 6 Term 9 Term 5 Term 10 ▸ Relevant terms and phrases ▸ Relevancy of terms ▸ co-occurrence between terms and concepts Term 1 Term 4 Term 2
  • 7. © Semantic Web Company 2020 Using Graph Embeddings for Fast Named Entity Disambiguation Character-Based Neural Networks for NLP in the Real World Existing Work in vicinity 7
  • 8. © Semantic Web Company 2020 ▸ Training an entity recognizer requires a large annotated corpus. ▸ Machine learning uses manually annotated training data: ▹ Available for common entity types like Persons, Organisations, Places ▹ Not available for many specific knowledge domains ▹ We need a domain expert to do the annotations The NER Challenge 8
  • 9. © Semantic Web Company 2020 Complementary Approach ▸ The PoolParty Extractor allows thesaurus-based entity recognition ▹ limited to the known vocabulary ▹ text context-independent ▸ Statistical NER models allows text-based entity recognition ▹ provide open-vocabulary entity recognition ▹ primarily based on the text context 9
  • 10. © Semantic Web Company 2020 ▸ Have a well-curated knowledge model: MesH, diseases branch ▸ Have training data: PubMed Corpus ▸ Have working data: Find diseases mentioned in newspapers ▸ Have a benchmark: BioBERT ▸ Trending topic: Coronavirus disease (COVID-19) outbreak 10 Use Case: Disease Annotator
  • 11. © Semantic Web Company 2020 11 Disease Annotator Sample Application
  • 12. © Semantic Web Company 2020 12
  • 13. © Semantic Web Company 2020 ▸ Semantic AI Approach: ▹ Automatically annotate training data ▹ Use a structured domain knowledge model ▹ Apply the model to a corpus ▹ Use the result annotations as training data ▹ Train the entity recognizer Semantic AI Approach 13
  • 14. © Semantic Web Company 2020 Evaluating the approach - Data ▸ Knowledge Model: Medical Subject Headings (MeSH) ▹ Controlled vocabulary for life sciences ▹ Evaluation: Diseases branch ▸ Training Corpus: PubMed Corpus (100k sentences) ▸ Ground Truth: Manually annotated NCBI Disease corpus (BioBERT) 14
  • 15. © Semantic Web Company 2020 Evaluation: Corpus folding I ▸ Compare NER with Thesaurus annotations ▹ PubMed corpus @ 80% training data + 20% test data ▹ Full MeSH diseases branch as model ▸ Results: ▹ Precision: 95.21% Recall: 91.62% F1: 93.38% 15
  • 16. © Semantic Web Company 2020 Evaluation: Corpus folding II ▸ Compare NER with ground truth: ▹ NCBI Disease corpus - manually annotated ▹ NCBI Disease corpus - automatically annotated ▸ Results: ▹ Precision: 82.15% Recall: 74.66% F1: 78.22% ■ Compare with BioBERT @ 89% (F1) - 11% ■ Compare with OpenNLP on NCBI training set + 5% 16
  • 17. © Semantic Web Company 2020 Evaluation: Thesaurus folding ▸ Compare NER with Thesaurus annotations ▹ MESH diseases branch @ 80% training data+ 20% test data ▹ PubMed corpus ▸ Evaluations: ▹ Compare with full knowledge model Extractor annotations ▹ Compare with hidden part of the folds 17
  • 18. © Semantic Web Company 2020 Semantic AI Benefits ▸ Bootstrapping path for Named Entity Recognizer ▹ Automated building of annotated training corpus ▹ Useful for agile recognizer training (app 10 minutes / cycle) ▹ No manual annotation efforts required ▸ Improvement on (vocabulary-based) Concept Extraction: ▹ High-quality thesaurus candidates ▹ Improved disambiguation 18
  • 19. © Semantic Web Company 2020 ▸ Establish feedback loop ▸ Provide generic connectors (Oracle ML) ▹ Leverage architecture benefits ▹ Provide integrated solution ▸ Run PoCs for similar use cases ▹ Fictional characters (Audiovisual) ▹ Emerging news topics (Broadcast) ▹ ... 19 Next steps...
  • 20. © Semantic Web Company 2020 Impulse: Sebastian Gabler Product Owner: Robert David Project Management: Alexi Lopez-Lorca NER bootstrapping process: Alexis Dimitriadis NER process development: Sotiris Karampatakis Development: Guilherme Rodrigues & Konstantin Dzekov Demo Application: Juliane Pinero-Winkler & Ali Marhubi 20 Credits Nine Persons, 7 Nations, 1 company
  • 21. © Semantic Web Company 2020 Thank you! 21 Q & A