SlideShare a Scribd company logo
1 of 17
Download to read offline
Cross-Evaluation of Entity Linking and Disambiguation
Systems for Clinical Text Annotation
Camilo Thorne Stefano Faralli Heiner Stuckenschmidt
Data and Web Science (DWS) Group
Universit¨at Mannheim, Germany
{camilo,stefano,heiner}@informatik.uni-mannheim.de
SEMANTiCS 2016
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 1 / 13
Motivation
Low dose pramipexole is neuroprotective in
the MPTP mouse model of Parkinson’s disease
(*)
Problems:
1 identify entities (nouns, noun phrases) within an text;
2 identify or resolve the meaning of such entities within such text by linking
them to a sense repository
3 resolve meaning of both domain-specific and generic terms
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 2 / 13
Motivation
Low dose pramipexole is neuroprotective in
the MPTP mouse model of Parkinson’s disease
(*)
Problems:
1 identify entities (nouns, noun phrases) within an text;
2 identify or resolve the meaning of such entities within such text by linking
them to a sense repository
3 resolve meaning of both domain-specific and generic terms
Question
Are there annotation services capable of both?
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 2 / 13
Annotators
MetaMap
(Aronson and
Lang, 2010)
clinical domain
sense repository:
UMLS
REST service
multilingual
sense: CUI
BabelFly
(Moro et al., 2014)
general domain
sense repository:
BabelNet
REST service
multilingual
sense: babelsynset
TagMe
(Ferragina and
Scaiella, 2010)
general domain
“sense” repository:
Wikipedia
REST service
English/Italian
“sense”: Wiki page
WordNet (Lesk)
(custom)
general domain
sense repository:
WordNet 3.0
Baseline
English
sense: synset
Problem: Sense repositories a priori not aligned
Solution: Use linked data in the form of DBpedia (Bizer et al., 2009) as pivot
(partial mappings)
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 3 / 13
Annotators
MetaMap
(Aronson and
Lang, 2010)
clinical domain
sense repository:
UMLS
REST service
multilingual
sense: CUI
BabelFly
(Moro et al., 2014)
general domain
sense repository:
BabelNet
REST service
multilingual
sense: babelsynset
TagMe
(Ferragina and
Scaiella, 2010)
general domain
“sense” repository:
Wikipedia
REST service
English/Italian
“sense”: Wiki page
WordNet (Lesk)
(custom)
general domain
sense repository:
WordNet 3.0
Baseline
English
sense: synset
Problem: Sense repositories a priori not aligned
Solution: Use linked data in the form of DBpedia (Bizer et al., 2009) as pivot
(partial mappings)
!! UMLS can be mapped to DBpedia via Medline and the LikedLifeData
initiative (Momtchev et al., 2009)
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 3 / 13
Annotations (Overview)
Use DBpedia as pivot:
sense sense ID DBpedia URI
Clinical pramipexol C0074710 http://dbpedia.org/resource/Pramipexole
(Gold) Parkinson disease C0030567 http://dbpedia.org/resource/Parkinson disease
MetaMap pramipexol C0074710 http://dbpedia.org/resource/Pramipexole
Parkinson disease C0030567 http://dbpedia.org/resource/Parkinson disease
BabelFly ATC code N04BC05 bn:03124207n http://dbpedia.org/resource/Pramipexole
TagMe pramipexole https://goo.gl/twrSVu http://dbpedia.org/resource/Pramipexole
Parkinson’s disease https://goo.gl/Xke6W3 http://dbpedia.org/resource/Parkinson’s disease
annotations for example (*)
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 4 / 13
SemRep Corpus (Kilicoglu et al., 2011)
Experiments ran over the SemRep corpus
Small annotated clinical corpus
428 clinical excerpts (MedLine/PubMed)
13, 948 word tokens
856 UMLS-annotated clinical terms
For each sentence, two noun phrases annotated with their corresponding
UMLS CUI by clinicians
606 terms can be associated to a corresponding DBpedia URI
Example (*) taken from SemRep
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 5 / 13
Annotation Statistics
# of CUIs in corpus (total) = 856
# of corpus DBpedia URIs = 606
# of resolved corpus URIs = 404
# of MetaMap DBpedia URIs = 343
# of resolved MetaMap URIs = 242
# of BabelFly DBpedia URIs = 432
# of resolved BabelFly URIs = 269
# of TagMe DBpedia URIs = 469
# of resolved TagMe URIs = 320
# of WordNet DBpedia URIs = 182
# of resolved WordNet URIs = 97
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 6 / 13
Cross-Evaluation
Pre = #corrent senses
#returned senses Rec = #corrent senses
#corpus senses F1 = 2·Pre·Rec
Pre+Rec
Pre
Rec
F-1
0.0
0.2
0.4
0.6
0.8
1.0
Performance
MetaMap BabelFly TagMe WordNet
(unresolved URIs)
Pre
Rec
F-1
0.0
0.2
0.4
0.6
0.8
1.0
Performance
MetaMap BabelFly TagMe WordNet
(resolved URIs)
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 7 / 13
Cross-Evaluation
Pre = #corrent senses
#returned senses Rec = #corrent senses
#corpus senses F1 = 2·Pre·Rec
Pre+Rec
Pre
Rec
F-1
0.0
0.2
0.4
0.6
0.8
1.0
Performance
MetaMap BabelFly TagMe WordNet
(unresolved URIs)
Pre
Rec
F-1
0.0
0.2
0.4
0.6
0.8
1.0
Performance
MetaMap BabelFly TagMe WordNet
(resolved URIs)
Conclusion
When URIs are resolved via same as, generic EL systems such as TagMe and
BabelNet match domain-specific annotators like MetaMap
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 7 / 13
Semantic Relatedness Measures
syn(s, s ) =
{(w, w ) ∈ g(s) × g(s ) | wn>0.2(w, w )}
|g(s)| + |g(s )|
syn+
(s, s ) =
{(w, w ) ∈ g(s) × g(s ) | wn>0(w, w )}
|g(s)| + |g(s )|
dsyn(s, s ) =
{(w, w ) ∈ g(s) × g(s ) | dn>0.2(w, w )}
|g(s)| + |g(s )|
dsyn+
(s, s ) =
{(w, w ) ∈ g(s) × g(s ) | dn>0(w, w )}
|g(s)| + |g(s )|
We measured:
1 WordNet similarity (low coverage, but better accuracy) under two
“synonymy” thresholds (“strict” > 0.2, “loose” > 0)
2 word embedding relatedness (standard Wikipedia-trained word2vec
word space models) under two “synonymy” thresholds (“strict” > 0.2
and “loose” > 0)
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 8 / 13
Annotation Relatedness
sy
sy+
dsy
dsy+
0.0
0.1
0.2
0.3
0.4
0.5
Coverage(avg.)
MetaMap BabelFly TagMe WordNet
Annotations Avg. len. (sent.)
Corpus sense glosses 66.41 words
BabelFly sense glosses 199.43 words
TagMe sense glosses 325.51 words
MetaMap sense glosses 191.76 words
WordNet sense glosses 50.50 words
Test Null hyp. p-value
Kruskal-Wallis identical 0.897
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 9 / 13
Annotation Relatedness
sy
sy+
dsy
dsy+
0.0
0.1
0.2
0.3
0.4
0.5
Coverage(avg.)
MetaMap BabelFly TagMe WordNet
Annotations Avg. len. (sent.)
Corpus sense glosses 66.41 words
BabelFly sense glosses 199.43 words
TagMe sense glosses 325.51 words
MetaMap sense glosses 191.76 words
WordNet sense glosses 50.50 words
Test Null hyp. p-value
Kruskal-Wallis identical 0.897
Conclusion
No significant differences w.r.t. semantic relatedness
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 9 / 13
Summing up...
We have cross-evaluated generic WSD and linking systems (BabelFly,
TagMe) with domain-specific (MetaMap) annotators
Generic WSD and linking systems show competitive results over the SemRep
gold standard
In particular, their greater coverage yields improvements in F1-score (TagMe
outclasses MetaMap in F1-score, but by a small margin)
In the future we plan to investigate if domain adaptation yields better results
and improve linking
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 10 / 13
Thank You!
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 11 / 13
References I
Aronson, A. R. and Lang, F.-M. (2010). And overview of MetaMap: Historical
perspective and recent advances. Journal of the American Medical Informatics
Association, 17(3):229–236.
Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., and
Hellmann, S. (2009). DBpedia - A crystallization point for the web of data.
Journal of Web Semantics, 7(3):154–165.
Ferragina, P. and Scaiella, U. (2010). TAGME: on-the-fly annotation of short text
fragments (by wikipedia entities). In Proceedings of the 19th ACM International
Conference on Information and Knowledge Management (CIKM 2010).
Kilicoglu, H., Rosenblat, G., Fiszman, M., and Rindfleisch, T. C. (2011).
Constructing a semantic predication gold standard from the biomedical
literature. BMC Bioinformatics, 12(486).
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 12 / 13
References II
Momtchev, V., Peychev, D., Primov, T., and Georgiev, G. (2009). Expanding the
pathway and interaction knowledge in linked life data. Proceedings of 2009
International Semantic Web Challenge.
Moro, A., Raganato, A., and Navigli, R. (2014). Entity linking meets word sense
disambiguation: a unified approach. Transactions of the Association for
Computational Linguistics, 2:231–244.
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 13 / 13

More Related Content

Viewers also liked

Kostas Kastrantas | Business Opportunities with Linked Open Data
Kostas Kastrantas  | Business Opportunities with Linked Open DataKostas Kastrantas  | Business Opportunities with Linked Open Data
Kostas Kastrantas | Business Opportunities with Linked Open Data
semanticsconference
 

Viewers also liked (20)

Michael Fuchs | How to compute semantic relationships between entities and fa...
Michael Fuchs | How to compute semantic relationships between entities and fa...Michael Fuchs | How to compute semantic relationships between entities and fa...
Michael Fuchs | How to compute semantic relationships between entities and fa...
 
Philippe Martin and Jérémy Bénard | Importing, Translating and Exporting Know...
Philippe Martin and Jérémy Bénard | Importing, Translating and Exporting Know...Philippe Martin and Jérémy Bénard | Importing, Translating and Exporting Know...
Philippe Martin and Jérémy Bénard | Importing, Translating and Exporting Know...
 
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
 
Thomas Kaleske | KN(owl)edge – the Linked Data Platform at Kuehne + Nagel
Thomas Kaleske | KN(owl)edge – the Linked Data Platform at Kuehne + NagelThomas Kaleske | KN(owl)edge – the Linked Data Platform at Kuehne + Nagel
Thomas Kaleske | KN(owl)edge – the Linked Data Platform at Kuehne + Nagel
 
Jörg Waitelonis, Henrik Jürges and Harald Sack | Don't compare Apples to Oran...
Jörg Waitelonis, Henrik Jürges and Harald Sack | Don't compare Apples to Oran...Jörg Waitelonis, Henrik Jürges and Harald Sack | Don't compare Apples to Oran...
Jörg Waitelonis, Henrik Jürges and Harald Sack | Don't compare Apples to Oran...
 
Sebastian Bader | Semantic Technologies for Assisted Decision-Making in Indus...
Sebastian Bader | Semantic Technologies for Assisted Decision-Making in Indus...Sebastian Bader | Semantic Technologies for Assisted Decision-Making in Indus...
Sebastian Bader | Semantic Technologies for Assisted Decision-Making in Indus...
 
Vladimir Alexiev | Semantic Enrichment of Twitter Microposts Helps Understand...
Vladimir Alexiev | Semantic Enrichment of Twitter Microposts Helps Understand...Vladimir Alexiev | Semantic Enrichment of Twitter Microposts Helps Understand...
Vladimir Alexiev | Semantic Enrichment of Twitter Microposts Helps Understand...
 
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
 
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
 
Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAid...
Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAid...Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAid...
Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAid...
 
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
 
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
 
David Kuilman | Creating a Semantic Enterprise Content model to support conti...
David Kuilman | Creating a Semantic Enterprise Content model to support conti...David Kuilman | Creating a Semantic Enterprise Content model to support conti...
David Kuilman | Creating a Semantic Enterprise Content model to support conti...
 
Kostas Kastrantas | Business Opportunities with Linked Open Data
Kostas Kastrantas  | Business Opportunities with Linked Open DataKostas Kastrantas  | Business Opportunities with Linked Open Data
Kostas Kastrantas | Business Opportunities with Linked Open Data
 
Victor Charpenay | Standardized Semantics for an Open Web of Things
Victor Charpenay | Standardized Semantics for an Open Web of ThingsVictor Charpenay | Standardized Semantics for an Open Web of Things
Victor Charpenay | Standardized Semantics for an Open Web of Things
 
Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphras...
Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphras...Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphras...
Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphras...
 
Shuangyong Song, Qingliang Miao and Yao Meng | Linking Images to Semantic Kno...
Shuangyong Song, Qingliang Miao and Yao Meng | Linking Images to Semantic Kno...Shuangyong Song, Qingliang Miao and Yao Meng | Linking Images to Semantic Kno...
Shuangyong Song, Qingliang Miao and Yao Meng | Linking Images to Semantic Kno...
 
Chalitha Perera | Cross Media Concept and Entity Driven Search for Enterprise
Chalitha Perera | Cross Media Concept and Entity Driven Search for EnterpriseChalitha Perera | Cross Media Concept and Entity Driven Search for Enterprise
Chalitha Perera | Cross Media Concept and Entity Driven Search for Enterprise
 
OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...
OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...
OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...
 
Thomas Vavra | New Ways of Handling Old Data
Thomas Vavra | New Ways of Handling Old DataThomas Vavra | New Ways of Handling Old Data
Thomas Vavra | New Ways of Handling Old Data
 

Similar to Camilo Thorne, Stefano Faralli and Heiner Stuckenschmidt | Entity Linking for Clinical Text Annotation and Disambiguation

Chapter 2 Text Operation and Term Weighting.pdf
Chapter 2 Text Operation and Term Weighting.pdfChapter 2 Text Operation and Term Weighting.pdf
Chapter 2 Text Operation and Term Weighting.pdf
JemalNesre1
 
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual SimilaritySemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
pathsproject
 
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
bosc
 
Towards comprehensive syntactic and semantic annotations of the clinical narr...
Towards comprehensive syntactic and semantic annotations of the clinical narr...Towards comprehensive syntactic and semantic annotations of the clinical narr...
Towards comprehensive syntactic and semantic annotations of the clinical narr...
Jinho Choi
 
Statistical Named Entity Recognition for Hungarian – analysis ...
Statistical Named Entity Recognition for Hungarian – analysis ...Statistical Named Entity Recognition for Hungarian – analysis ...
Statistical Named Entity Recognition for Hungarian – analysis ...
butest
 

Similar to Camilo Thorne, Stefano Faralli and Heiner Stuckenschmidt | Entity Linking for Clinical Text Annotation and Disambiguation (20)

Feb20 mayo-webinar-21feb2012
Feb20 mayo-webinar-21feb2012Feb20 mayo-webinar-21feb2012
Feb20 mayo-webinar-21feb2012
 
Data Mining in Rediology reports
Data Mining in Rediology reportsData Mining in Rediology reports
Data Mining in Rediology reports
 
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
 
Chapter 2 Text Operation and Term Weighting.pdf
Chapter 2 Text Operation and Term Weighting.pdfChapter 2 Text Operation and Term Weighting.pdf
Chapter 2 Text Operation and Term Weighting.pdf
 
Keyword searching idc
Keyword searching idcKeyword searching idc
Keyword searching idc
 
Phonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech SystemsPhonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech Systems
 
NLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsNLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology Constraints
 
Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...
 
A Comparative Analysis Of The Entropy And Transition Point Approach In Repres...
A Comparative Analysis Of The Entropy And Transition Point Approach In Repres...A Comparative Analysis Of The Entropy And Transition Point Approach In Repres...
A Comparative Analysis Of The Entropy And Transition Point Approach In Repres...
 
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual SimilaritySemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
 
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
 
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
 
PPT
PPTPPT
PPT
 
Ceis 3
Ceis 3Ceis 3
Ceis 3
 
Towards comprehensive syntactic and semantic annotations of the clinical narr...
Towards comprehensive syntactic and semantic annotations of the clinical narr...Towards comprehensive syntactic and semantic annotations of the clinical narr...
Towards comprehensive syntactic and semantic annotations of the clinical narr...
 
NAISTビッグデータシンポジウム - 情報 松本先生
NAISTビッグデータシンポジウム - 情報 松本先生NAISTビッグデータシンポジウム - 情報 松本先生
NAISTビッグデータシンポジウム - 情報 松本先生
 
Statistical Named Entity Recognition for Hungarian – analysis ...
Statistical Named Entity Recognition for Hungarian – analysis ...Statistical Named Entity Recognition for Hungarian – analysis ...
Statistical Named Entity Recognition for Hungarian – analysis ...
 
Lecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document ParsingLecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document Parsing
 
Aletras, Nikolaos and Stevenson, Mark (2013) "Evaluating Topic Coherence Us...
Aletras, Nikolaos  and  Stevenson, Mark (2013) "Evaluating Topic Coherence Us...Aletras, Nikolaos  and  Stevenson, Mark (2013) "Evaluating Topic Coherence Us...
Aletras, Nikolaos and Stevenson, Mark (2013) "Evaluating Topic Coherence Us...
 
Improving Robustness and Flexibility of Concept Taxonomy Learning from Text
Improving Robustness and Flexibility of Concept Taxonomy Learning from Text Improving Robustness and Flexibility of Concept Taxonomy Learning from Text
Improving Robustness and Flexibility of Concept Taxonomy Learning from Text
 

More from semanticsconference

More from semanticsconference (20)

Linear books to open world adventure
Linear books to open world adventureLinear books to open world adventure
Linear books to open world adventure
 
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
Session 1.2   high-precision, context-free entity linking exploiting unambigu...Session 1.2   high-precision, context-free entity linking exploiting unambigu...
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
 
Session 4.3 semantic annotation for enhancing collaborative ideation
Session 4.3   semantic annotation for enhancing collaborative ideationSession 4.3   semantic annotation for enhancing collaborative ideation
Session 4.3 semantic annotation for enhancing collaborative ideation
 
Session 1.1 dalicc - data licenses clearance center
Session 1.1   dalicc - data licenses clearance centerSession 1.1   dalicc - data licenses clearance center
Session 1.1 dalicc - data licenses clearance center
 
Session 1.3 context information management across smart city knowledge domains
Session 1.3   context information management across smart city knowledge domainsSession 1.3   context information management across smart city knowledge domains
Session 1.3 context information management across smart city knowledge domains
 
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
Session 0.0   aussenac semanticsnl-pwebsem2017-v4Session 0.0   aussenac semanticsnl-pwebsem2017-v4
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
 
Session 0.0 keynote sandeep sacheti - final hi res
Session 0.0   keynote sandeep sacheti - final hi resSession 0.0   keynote sandeep sacheti - final hi res
Session 0.0 keynote sandeep sacheti - final hi res
 
Session 1.1 linked data applied: a field report from the netherlands
Session 1.1   linked data applied: a field report from the netherlandsSession 1.1   linked data applied: a field report from the netherlands
Session 1.1 linked data applied: a field report from the netherlands
 
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
Session 1.2   enrich your knowledge graphs: linked data integration with pool...Session 1.2   enrich your knowledge graphs: linked data integration with pool...
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
 
Session 1.4 connecting information from legislation and datasets using a ca...
Session 1.4   connecting information from legislation and datasets using a ca...Session 1.4   connecting information from legislation and datasets using a ca...
Session 1.4 connecting information from legislation and datasets using a ca...
 
Session 1.4 a distributed network of heritage information
Session 1.4   a distributed network of heritage informationSession 1.4   a distributed network of heritage information
Session 1.4 a distributed network of heritage information
 
Session 0.0 media panel - matthias priem - gtuo - semantics 2017
Session 0.0   media panel - matthias priem - gtuo - semantics 2017Session 0.0   media panel - matthias priem - gtuo - semantics 2017
Session 0.0 media panel - matthias priem - gtuo - semantics 2017
 
Session 1.3 semantic asset management in the dutch rail engineering and con...
Session 1.3   semantic asset management in the dutch rail engineering and con...Session 1.3   semantic asset management in the dutch rail engineering and con...
Session 1.3 semantic asset management in the dutch rail engineering and con...
 
Session 1.3 energy, smart homes & smart grids: towards interoperability...
Session 1.3   energy, smart homes & smart grids: towards interoperability...Session 1.3   energy, smart homes & smart grids: towards interoperability...
Session 1.3 energy, smart homes & smart grids: towards interoperability...
 
Session 1.2 improving access to digital content by semantic enrichment
Session 1.2   improving access to digital content by semantic enrichmentSession 1.2   improving access to digital content by semantic enrichment
Session 1.2 improving access to digital content by semantic enrichment
 
Session 2.3 semantics for safeguarding & security – a police story
Session 2.3   semantics for safeguarding & security – a police storySession 2.3   semantics for safeguarding & security – a police story
Session 2.3 semantics for safeguarding & security – a police story
 
Session 2.5 semantic similarity based clustering of license excerpts for im...
Session 2.5   semantic similarity based clustering of license excerpts for im...Session 2.5   semantic similarity based clustering of license excerpts for im...
Session 2.5 semantic similarity based clustering of license excerpts for im...
 
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
Session 4.2   unleash the triple: leveraging a corporate discovery interface....Session 4.2   unleash the triple: leveraging a corporate discovery interface....
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
 
Session 1.6 slovak public metadata governance and management based on linke...
Session 1.6   slovak public metadata governance and management based on linke...Session 1.6   slovak public metadata governance and management based on linke...
Session 1.6 slovak public metadata governance and management based on linke...
 
Session 5.6 towards a semantic outlier detection framework in wireless sens...
Session 5.6   towards a semantic outlier detection framework in wireless sens...Session 5.6   towards a semantic outlier detection framework in wireless sens...
Session 5.6 towards a semantic outlier detection framework in wireless sens...
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Camilo Thorne, Stefano Faralli and Heiner Stuckenschmidt | Entity Linking for Clinical Text Annotation and Disambiguation

  • 1. Cross-Evaluation of Entity Linking and Disambiguation Systems for Clinical Text Annotation Camilo Thorne Stefano Faralli Heiner Stuckenschmidt Data and Web Science (DWS) Group Universit¨at Mannheim, Germany {camilo,stefano,heiner}@informatik.uni-mannheim.de SEMANTiCS 2016 C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 1 / 13
  • 2. Motivation Low dose pramipexole is neuroprotective in the MPTP mouse model of Parkinson’s disease (*) Problems: 1 identify entities (nouns, noun phrases) within an text; 2 identify or resolve the meaning of such entities within such text by linking them to a sense repository 3 resolve meaning of both domain-specific and generic terms C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 2 / 13
  • 3. Motivation Low dose pramipexole is neuroprotective in the MPTP mouse model of Parkinson’s disease (*) Problems: 1 identify entities (nouns, noun phrases) within an text; 2 identify or resolve the meaning of such entities within such text by linking them to a sense repository 3 resolve meaning of both domain-specific and generic terms Question Are there annotation services capable of both? C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 2 / 13
  • 4. Annotators MetaMap (Aronson and Lang, 2010) clinical domain sense repository: UMLS REST service multilingual sense: CUI BabelFly (Moro et al., 2014) general domain sense repository: BabelNet REST service multilingual sense: babelsynset TagMe (Ferragina and Scaiella, 2010) general domain “sense” repository: Wikipedia REST service English/Italian “sense”: Wiki page WordNet (Lesk) (custom) general domain sense repository: WordNet 3.0 Baseline English sense: synset Problem: Sense repositories a priori not aligned Solution: Use linked data in the form of DBpedia (Bizer et al., 2009) as pivot (partial mappings) C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 3 / 13
  • 5. Annotators MetaMap (Aronson and Lang, 2010) clinical domain sense repository: UMLS REST service multilingual sense: CUI BabelFly (Moro et al., 2014) general domain sense repository: BabelNet REST service multilingual sense: babelsynset TagMe (Ferragina and Scaiella, 2010) general domain “sense” repository: Wikipedia REST service English/Italian “sense”: Wiki page WordNet (Lesk) (custom) general domain sense repository: WordNet 3.0 Baseline English sense: synset Problem: Sense repositories a priori not aligned Solution: Use linked data in the form of DBpedia (Bizer et al., 2009) as pivot (partial mappings) !! UMLS can be mapped to DBpedia via Medline and the LikedLifeData initiative (Momtchev et al., 2009) C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 3 / 13
  • 6. Annotations (Overview) Use DBpedia as pivot: sense sense ID DBpedia URI Clinical pramipexol C0074710 http://dbpedia.org/resource/Pramipexole (Gold) Parkinson disease C0030567 http://dbpedia.org/resource/Parkinson disease MetaMap pramipexol C0074710 http://dbpedia.org/resource/Pramipexole Parkinson disease C0030567 http://dbpedia.org/resource/Parkinson disease BabelFly ATC code N04BC05 bn:03124207n http://dbpedia.org/resource/Pramipexole TagMe pramipexole https://goo.gl/twrSVu http://dbpedia.org/resource/Pramipexole Parkinson’s disease https://goo.gl/Xke6W3 http://dbpedia.org/resource/Parkinson’s disease annotations for example (*) C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 4 / 13
  • 7. SemRep Corpus (Kilicoglu et al., 2011) Experiments ran over the SemRep corpus Small annotated clinical corpus 428 clinical excerpts (MedLine/PubMed) 13, 948 word tokens 856 UMLS-annotated clinical terms For each sentence, two noun phrases annotated with their corresponding UMLS CUI by clinicians 606 terms can be associated to a corresponding DBpedia URI Example (*) taken from SemRep C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 5 / 13
  • 8. Annotation Statistics # of CUIs in corpus (total) = 856 # of corpus DBpedia URIs = 606 # of resolved corpus URIs = 404 # of MetaMap DBpedia URIs = 343 # of resolved MetaMap URIs = 242 # of BabelFly DBpedia URIs = 432 # of resolved BabelFly URIs = 269 # of TagMe DBpedia URIs = 469 # of resolved TagMe URIs = 320 # of WordNet DBpedia URIs = 182 # of resolved WordNet URIs = 97 C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 6 / 13
  • 9. Cross-Evaluation Pre = #corrent senses #returned senses Rec = #corrent senses #corpus senses F1 = 2·Pre·Rec Pre+Rec Pre Rec F-1 0.0 0.2 0.4 0.6 0.8 1.0 Performance MetaMap BabelFly TagMe WordNet (unresolved URIs) Pre Rec F-1 0.0 0.2 0.4 0.6 0.8 1.0 Performance MetaMap BabelFly TagMe WordNet (resolved URIs) C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 7 / 13
  • 10. Cross-Evaluation Pre = #corrent senses #returned senses Rec = #corrent senses #corpus senses F1 = 2·Pre·Rec Pre+Rec Pre Rec F-1 0.0 0.2 0.4 0.6 0.8 1.0 Performance MetaMap BabelFly TagMe WordNet (unresolved URIs) Pre Rec F-1 0.0 0.2 0.4 0.6 0.8 1.0 Performance MetaMap BabelFly TagMe WordNet (resolved URIs) Conclusion When URIs are resolved via same as, generic EL systems such as TagMe and BabelNet match domain-specific annotators like MetaMap C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 7 / 13
  • 11. Semantic Relatedness Measures syn(s, s ) = {(w, w ) ∈ g(s) × g(s ) | wn>0.2(w, w )} |g(s)| + |g(s )| syn+ (s, s ) = {(w, w ) ∈ g(s) × g(s ) | wn>0(w, w )} |g(s)| + |g(s )| dsyn(s, s ) = {(w, w ) ∈ g(s) × g(s ) | dn>0.2(w, w )} |g(s)| + |g(s )| dsyn+ (s, s ) = {(w, w ) ∈ g(s) × g(s ) | dn>0(w, w )} |g(s)| + |g(s )| We measured: 1 WordNet similarity (low coverage, but better accuracy) under two “synonymy” thresholds (“strict” > 0.2, “loose” > 0) 2 word embedding relatedness (standard Wikipedia-trained word2vec word space models) under two “synonymy” thresholds (“strict” > 0.2 and “loose” > 0) C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 8 / 13
  • 12. Annotation Relatedness sy sy+ dsy dsy+ 0.0 0.1 0.2 0.3 0.4 0.5 Coverage(avg.) MetaMap BabelFly TagMe WordNet Annotations Avg. len. (sent.) Corpus sense glosses 66.41 words BabelFly sense glosses 199.43 words TagMe sense glosses 325.51 words MetaMap sense glosses 191.76 words WordNet sense glosses 50.50 words Test Null hyp. p-value Kruskal-Wallis identical 0.897 C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 9 / 13
  • 13. Annotation Relatedness sy sy+ dsy dsy+ 0.0 0.1 0.2 0.3 0.4 0.5 Coverage(avg.) MetaMap BabelFly TagMe WordNet Annotations Avg. len. (sent.) Corpus sense glosses 66.41 words BabelFly sense glosses 199.43 words TagMe sense glosses 325.51 words MetaMap sense glosses 191.76 words WordNet sense glosses 50.50 words Test Null hyp. p-value Kruskal-Wallis identical 0.897 Conclusion No significant differences w.r.t. semantic relatedness C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 9 / 13
  • 14. Summing up... We have cross-evaluated generic WSD and linking systems (BabelFly, TagMe) with domain-specific (MetaMap) annotators Generic WSD and linking systems show competitive results over the SemRep gold standard In particular, their greater coverage yields improvements in F1-score (TagMe outclasses MetaMap in F1-score, but by a small margin) In the future we plan to investigate if domain adaptation yields better results and improve linking C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 10 / 13
  • 15. Thank You! C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 11 / 13
  • 16. References I Aronson, A. R. and Lang, F.-M. (2010). And overview of MetaMap: Historical perspective and recent advances. Journal of the American Medical Informatics Association, 17(3):229–236. Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., and Hellmann, S. (2009). DBpedia - A crystallization point for the web of data. Journal of Web Semantics, 7(3):154–165. Ferragina, P. and Scaiella, U. (2010). TAGME: on-the-fly annotation of short text fragments (by wikipedia entities). In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM 2010). Kilicoglu, H., Rosenblat, G., Fiszman, M., and Rindfleisch, T. C. (2011). Constructing a semantic predication gold standard from the biomedical literature. BMC Bioinformatics, 12(486). C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 12 / 13
  • 17. References II Momtchev, V., Peychev, D., Primov, T., and Georgiev, G. (2009). Expanding the pathway and interaction knowledge in linked life data. Proceedings of 2009 International Semantic Web Challenge. Moro, A., Raganato, A., and Navigli, R. (2014). Entity linking meets word sense disambiguation: a unified approach. Transactions of the Association for Computational Linguistics, 2:231–244. C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 13 / 13