SlideShare a Scribd company logo
1 of 44
UMLS-Interface and UMLS-Similarity: Open Source Software for Measuring Paths and Semantic Similarity Bridget  McInnes Ted Pedersen  Serguei Pakhomov 1 11/17/2009
Objective Develop tools to automatically compute the semantic similarity between two concepts in the biomedical domain using measures originally developed for general English using the Unified Medical Language System (UMLS) 2
Motivation Clustering symptoms and disorders found in the text of clinical reports for post marking medication safety and surveillance Identification of patients for clinical studies Improving the sensitivity of document retrieval of scientific journals and clinical reports Development of terminologies and ontologies Clustering of biomedical documents Word sense disambiguation 3
Unified Medical Language System Knowledge representation framework  Contains 3 Main components: Metathesaurus Semantic Network SPECIALIST Lexicon 4
Metathesaurus Semi-automatically integrates biomedical concepts from over a 100 controlled medical terminologies  Source vocabularies are organized based on their Atomic Unique Identifiers Metathesaurus is organized based on their  Concept Unique Identifier (CUI) 5
Concept Unique Identifiers (CUIs) 6 CUI C0009264 Cold Temperature AUI A15588749 Cold Temperature AUI A3292554 Low Temperature MSH SNOMED-CT
CUI Information The concepts (AUIs) from the source vocabularies may contain information about the concept such as its Definition Relation information between the concepts The information from the AUIs can be obtained through their respective CUIs 7
Relations between CUIs in MSH 8 AUI A0123939  Temperature is-a AUI A15588749 Cold Temperature CUI C0039476  Temperature PAR/CHD      (MSH) CUI C0009264 Cold Temperature MSH
Relations between CUIs in SNOMED-CT 9 AUI A2887140  Temperature is-a AUI A3292554 Low Temperature CUI C0039476  Temperature PAR/CHD      (SNOMED-CT) CUI C0009264 Cold Temperature SNOMED-CT
Multiple Relations 10 PAR/CHD      (SNOMED-CT) CUI C0039476  Temperature CUI C0009264 Cold Temperature CUI C0039476  Temperature PAR/CHD      (MSH) CUI C0009264 Cold Temperature
Relation Information 11 MRHIER MRREL AUI A0123939  Temperature CUI C0039476  Temperature Relation Relation AUI A15588749 Cold Temperature CUI C0009264 Cold Temperature
MRREL and MRHIER MRHIER  Contains the full path to root relations between AUIs from each of the sources is-a part-of MRREL Contains the pairwise relations between CUIs Relations: PAR/CHD RB/RN It is possible to generate MRHEIR from MRREL except for the following sources: AIR MSH SNM2 USPMG OMS 12
CUI versus AUI Hierarchy The benefit of using CUIs Ability to obtain the relation information between concepts across sources Ability to obtain the relation information between concepts using more than one type of relation: PAR/CHD – parent/child (relation in MRHIER) RB/RN – narrower/broader SIB – sibling RL – concepts are similar or ‘alike’ The benefit of using AUIs Ability to obtain relation information (PAR/CHD) between concepts in the same source very quickly  incorporates tree positional information for sources such as MSH                        UMLS-Query by Shah and Musen, 2008	 13
UMLS-Interface Perl interface to the UMLS present locally in a MySQL database. Its main purpose is to returns path information about CUIs using the relation information in MRREL  All possible paths to the root Shortest path between two concepts 14
UMLS-Similarity A suite of perl modules that implement a number of path-based semantic similarity measures to determine the similarity between two CUIs in the UMLS Measures are path-based because they rely on the location of the concepts in a hierarchy The path information is obtained using UMLS-Interface Semantic Similarity Measures: Path measure Conceptual Distance (Rada, et. al, 1989) Leacock and Chodorow, 1998 Wu and Palmer, 1994 Nguyen and Al-Mubaid, 2006 15
Semantic Similarity Example Path measure 16 1 Sim(c1,c2) =  N where N  =  # links in the shortest  path   	between the two concepts c1 and c2
Similarity Given Specified Sources 17 C0005898 Body  Regions PAR      (SNOMED-CT) C0229962 Anatomic Part PAR (MSH) C0015385 Limbs PAR      (SNOMED-CT)
Similarity Given Specified Sources 18 C0005898 Body  Regions PAR      (SNOMED-CT) C0229962 Anatomic Part Similarity = 1/1 PAR (MSH) C0015385 Limbs PAR      (SNOMED-CT)
Similarity Given Specified Sources 19 C0005898 Body  Regions PAR      (SNOMED-CT) Similarity = 1/2 C0229962 Anatomic Part PAR (MSH) C0015385 Limbs PAR      (SNOMED-CT)
Similarity Given Specified Sources 20 C0005898 Body  Regions PAR      (SNOMED-CT) Similarity = 1/1 C0229962 Anatomic Part PAR (MSH) C0015385 Limbs PAR      (SNOMED-CT)
Similarity Given Specified Relations 21 C0005898 Body  Regions RB (MSH) C0229962 Anatomic Part PAR (MSH) C0015385 Limbs RB (MSH)
Similarity Given Specified Relations 22 C0005898 Body  Regions RB (MSH) C0229962 Anatomic Part Similarity = 1/1 PAR (MSH) C0015385 Limbs RB (MSH)
Similarity Given Specified Relations 23 C0005898 Body  Regions Similarity = 1/2 RB (MSH) C0229962 Anatomic Part PAR (MSH) C0015385 Limbs RB (MSH)
Similarity Given Specified Relations 24 C0005898 Body  Regions Similarity =  1/1 RB (MSH) C0229962 Anatomic Part PAR (MSH) C0015385 Limbs RB (MSH)
Functional Validation Comparison with Previous Work: Pedersen, et al. 2007 Nguyen and Al-Mubaid, 2006 Caviedes and Cimino, 2004 25
Pedersen, et al.  Semantic Similarity Measures Path Leacock and Chodorow, 1998 Source SNOMEDCT Data 29 medical terms pairs  Similarity determined by: 9 Medical Coders 3 Physicians 4 Point Scale 4 – practically synonymous 3 – related 2 – marginally related 1 - unrelated  Spearman’s Rank Correlation Coefficient 26
Comparison with Pedersen, et al. Semantic Similarity Measures Path Leacock and Chodorow, 1998 Source: SNOMED-CT from UMLS 2008AB Relations: PAR/CHD Comparison with human annotations Spearman Rank Correlation Coefficient 27
Comparison with Pedersen, et al. 28
Comparison with Pedersen, et al. 29
Nguyen and Al-Mubaid Semantic Similarity Measures Nguyen and Al-Mubaid, 2006 Leacock and Chodorow, 1998 Wu and Palmer, 1994 Path Source: MSH Same Dataset created by Pedersen, et al.  Data 29 medical terms pairs 	 Similarity determined by: 9 Medical Coders 3 Physicians Spearman’s Rank Correlation Coefficient 30
Comparison with Nguyen and Al-Mubaid Semantic Similarity Measures Nguyen and Al-Mubaid, 2006 Leacock and Chodorow, 1998 Wu and Palmer, 1994 Path Source: MSH from UMLS 2008AB Relations: PAR/CHD Comparison with human annotations Spearman Rank Correlation Coefficient 31
Comparison with Nguyen and Al-Mubaid 32
Caviedes and Cimino Semantic Similarity Measure Conceptual Distance – Rada, et al. Source: MSH Relations: PAR/CHD Data 10 medical terms pairs using following CUIs Digestive system disease: C0012242 Peptic esophagitis: C0014869 Psychotherapy: C0033968 Thirst: C0039971 Thoracic duct: C0039979 33
Comparison with Caviedes and Cimino Semantic Similarity Measures Conceptual Distance  Originally proposed by Rada, et. al., 1989 Source: MSH from UMLS 2008AB Relations: PAR/CHD Comparison between the Conceptual Distance Scores 34
Comparison with Caviedes and Cimino 35
Comparison with Caviedes and Cimino 36
Results The results show that UMLS-Similarity can be used to reproduce the results reported by: Pedersen, et al. Caviedes and Cimino 37
Results The correlation results obtained by UMLS-Similarity and reported by Nguyen and Al-Mubaid vary Different versions of MSH were used to conduct the experiment Possibly different mappings of the terms to CUIs in MSH were used Information used by Nguyen and Al-Mubaid comes directly from MSH which is located in MRHEIR  and as PAR/CHD relations in MRREL  It is not possible to generate MRHIER from MRREL because the full  path-to-root is a transitive closure of the pairwise PAR/CHD relations which does not hold true for MSH because a MSH concept may have different children depending on its tree position 38
Conclusions UMLS-Similarity Used to determine the similarity between two concepts given a specified set of sources and relations Contains the following similarity measures Path measure Conceptual Distance proposed Rada, et. al. 1989	 Leacock and Chodorow, 1998 Wu and Palmer, 1994 Nguyen and Al-Mubaid, 2006 UMLS-Interface Used to obtain path information about a CUI given a specified set of sources and relations 39
Future Work UMLS-Interface Improve the efficiency in which the path information is stored  UMLS-Similarity Information Content Similarity Measures Resnik, 1995 Jiang and Conrath, 1997 Lin, 1997 Relatedness Measures Patwardhan, 2003 40
Take Home Message #1 41 UMLS-Interface can used to extract path information about a concept given a specified  set of sources and relations.
Take Home Message #2 42 UMLS-Similarity can be used to compute the semantic similarity between two concepts given a specified set of sources and relations.
UMLS-Interface http://search.cpan.org/dist/UMLS-Interface UMLS-Similarity http://search.cpan.org/dist/UMLS-Similarity 43 Availability
Thank You We would like to thank  Kin Wah Fung,  Olivier Bodenreider,  Jan Willis  Lan Aronson  The research was supported in parts by: Fellowships: NLM Research Participation Program  GAANN fellowship from US Dept. of Ed. Grants	 IR01LM009623-01A2 from NIH, NLM 44

More Related Content

Similar to Amia2009

Zhe_2014JointSummits_v6
Zhe_2014JointSummits_v6Zhe_2014JointSummits_v6
Zhe_2014JointSummits_v6Zhe (Henry) He
 
Text Mining Radiology Reports for Deep Learning Radiology Images
Text Mining Radiology Reports for Deep Learning Radiology Images Text Mining Radiology Reports for Deep Learning Radiology Images
Text Mining Radiology Reports for Deep Learning Radiology Images Yifan Peng
 
Semantic Similarity Measures between Terms in the Biomedical Domain within f...
 Semantic Similarity Measures between Terms in the Biomedical Domain within f... Semantic Similarity Measures between Terms in the Biomedical Domain within f...
Semantic Similarity Measures between Terms in the Biomedical Domain within f...Editor IJCATR
 
YOLOv8-Based Lung Nodule Detection: A Novel Hybrid Deep Learning Model Proposal
YOLOv8-Based Lung Nodule Detection: A Novel Hybrid Deep Learning Model ProposalYOLOv8-Based Lung Nodule Detection: A Novel Hybrid Deep Learning Model Proposal
YOLOv8-Based Lung Nodule Detection: A Novel Hybrid Deep Learning Model ProposalIRJET Journal
 
A Statistical Analysis Of Summarization Evaluation Metrics Using Resampling M...
A Statistical Analysis Of Summarization Evaluation Metrics Using Resampling M...A Statistical Analysis Of Summarization Evaluation Metrics Using Resampling M...
A Statistical Analysis Of Summarization Evaluation Metrics Using Resampling M...Lisa Cain
 
Bertrand de Meulder-El impacto de las ciencias ómicas en la medicina, la nutr...
Bertrand de Meulder-El impacto de las ciencias ómicas en la medicina, la nutr...Bertrand de Meulder-El impacto de las ciencias ómicas en la medicina, la nutr...
Bertrand de Meulder-El impacto de las ciencias ómicas en la medicina, la nutr...Fundación Ramón Areces
 
A Belief Rule Based (BRB) Decision Support System to Assess Clinical Asthma S...
A Belief Rule Based (BRB) Decision Support System to Assess Clinical Asthma S...A Belief Rule Based (BRB) Decision Support System to Assess Clinical Asthma S...
A Belief Rule Based (BRB) Decision Support System to Assess Clinical Asthma S...Khalid Md Saifuddin
 
Meta-Analysis in Ayurveda
Meta-Analysis in AyurvedaMeta-Analysis in Ayurveda
Meta-Analysis in AyurvedaAyurdata
 
Drug Target Interaction (DTI) prediction (MSc. thesis)
Drug Target Interaction (DTI) prediction (MSc. thesis) Drug Target Interaction (DTI) prediction (MSc. thesis)
Drug Target Interaction (DTI) prediction (MSc. thesis) Dimitris Papadopoulos
 
Evaluating Semantic Similarity between Biomedical Concepts/Classes through S...
 Evaluating Semantic Similarity between Biomedical Concepts/Classes through S... Evaluating Semantic Similarity between Biomedical Concepts/Classes through S...
Evaluating Semantic Similarity between Biomedical Concepts/Classes through S...Editor IJCATR
 
RSC Environmental Cheminformatics to Identify Unknowns Feb 2019
RSC Environmental Cheminformatics to Identify Unknowns Feb 2019RSC Environmental Cheminformatics to Identify Unknowns Feb 2019
RSC Environmental Cheminformatics to Identify Unknowns Feb 2019Emma Schymanski
 
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...Servio Fernando Lima Reina
 
Presentation 2007 Journal Club Azhar Ali Shah
Presentation 2007 Journal Club Azhar Ali ShahPresentation 2007 Journal Club Azhar Ali Shah
Presentation 2007 Journal Club Azhar Ali Shahguest5de83e
 
String Identifier in Multiple Medical Databases
String Identifier in Multiple Medical DatabasesString Identifier in Multiple Medical Databases
String Identifier in Multiple Medical DatabasesEswar Publications
 
Three Feature Based Ensemble Deep Learning Model for Pulmonary Disease Classi...
Three Feature Based Ensemble Deep Learning Model for Pulmonary Disease Classi...Three Feature Based Ensemble Deep Learning Model for Pulmonary Disease Classi...
Three Feature Based Ensemble Deep Learning Model for Pulmonary Disease Classi...IRJET Journal
 
Automated clinicalontologyextraction
Automated clinicalontologyextractionAutomated clinicalontologyextraction
Automated clinicalontologyextractionChimezie Ogbuji
 
QST Reference Data 2010 magerl pain
QST Reference Data 2010 magerl painQST Reference Data 2010 magerl pain
QST Reference Data 2010 magerl painPaul Coelho, MD
 
TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...
TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...
TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...ijcisjournal
 

Similar to Amia2009 (20)

Zhe_2014JointSummits_v6
Zhe_2014JointSummits_v6Zhe_2014JointSummits_v6
Zhe_2014JointSummits_v6
 
Text Mining Radiology Reports for Deep Learning Radiology Images
Text Mining Radiology Reports for Deep Learning Radiology Images Text Mining Radiology Reports for Deep Learning Radiology Images
Text Mining Radiology Reports for Deep Learning Radiology Images
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Semantic Similarity Measures between Terms in the Biomedical Domain within f...
 Semantic Similarity Measures between Terms in the Biomedical Domain within f... Semantic Similarity Measures between Terms in the Biomedical Domain within f...
Semantic Similarity Measures between Terms in the Biomedical Domain within f...
 
YOLOv8-Based Lung Nodule Detection: A Novel Hybrid Deep Learning Model Proposal
YOLOv8-Based Lung Nodule Detection: A Novel Hybrid Deep Learning Model ProposalYOLOv8-Based Lung Nodule Detection: A Novel Hybrid Deep Learning Model Proposal
YOLOv8-Based Lung Nodule Detection: A Novel Hybrid Deep Learning Model Proposal
 
A Statistical Analysis Of Summarization Evaluation Metrics Using Resampling M...
A Statistical Analysis Of Summarization Evaluation Metrics Using Resampling M...A Statistical Analysis Of Summarization Evaluation Metrics Using Resampling M...
A Statistical Analysis Of Summarization Evaluation Metrics Using Resampling M...
 
Bertrand de Meulder-El impacto de las ciencias ómicas en la medicina, la nutr...
Bertrand de Meulder-El impacto de las ciencias ómicas en la medicina, la nutr...Bertrand de Meulder-El impacto de las ciencias ómicas en la medicina, la nutr...
Bertrand de Meulder-El impacto de las ciencias ómicas en la medicina, la nutr...
 
A Belief Rule Based (BRB) Decision Support System to Assess Clinical Asthma S...
A Belief Rule Based (BRB) Decision Support System to Assess Clinical Asthma S...A Belief Rule Based (BRB) Decision Support System to Assess Clinical Asthma S...
A Belief Rule Based (BRB) Decision Support System to Assess Clinical Asthma S...
 
Meta-Analysis in Ayurveda
Meta-Analysis in AyurvedaMeta-Analysis in Ayurveda
Meta-Analysis in Ayurveda
 
Drug Target Interaction (DTI) prediction (MSc. thesis)
Drug Target Interaction (DTI) prediction (MSc. thesis) Drug Target Interaction (DTI) prediction (MSc. thesis)
Drug Target Interaction (DTI) prediction (MSc. thesis)
 
Evaluating Semantic Similarity between Biomedical Concepts/Classes through S...
 Evaluating Semantic Similarity between Biomedical Concepts/Classes through S... Evaluating Semantic Similarity between Biomedical Concepts/Classes through S...
Evaluating Semantic Similarity between Biomedical Concepts/Classes through S...
 
RSC Environmental Cheminformatics to Identify Unknowns Feb 2019
RSC Environmental Cheminformatics to Identify Unknowns Feb 2019RSC Environmental Cheminformatics to Identify Unknowns Feb 2019
RSC Environmental Cheminformatics to Identify Unknowns Feb 2019
 
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
 
Presentation 2007 Journal Club Azhar Ali Shah
Presentation 2007 Journal Club Azhar Ali ShahPresentation 2007 Journal Club Azhar Ali Shah
Presentation 2007 Journal Club Azhar Ali Shah
 
String Identifier in Multiple Medical Databases
String Identifier in Multiple Medical DatabasesString Identifier in Multiple Medical Databases
String Identifier in Multiple Medical Databases
 
Three Feature Based Ensemble Deep Learning Model for Pulmonary Disease Classi...
Three Feature Based Ensemble Deep Learning Model for Pulmonary Disease Classi...Three Feature Based Ensemble Deep Learning Model for Pulmonary Disease Classi...
Three Feature Based Ensemble Deep Learning Model for Pulmonary Disease Classi...
 
Automated clinicalontologyextraction
Automated clinicalontologyextractionAutomated clinicalontologyextraction
Automated clinicalontologyextraction
 
The Performance Validation of Neural Network Based 13C NMR Prediction Using a...
The Performance Validation of Neural Network Based 13C NMR Prediction Using a...The Performance Validation of Neural Network Based 13C NMR Prediction Using a...
The Performance Validation of Neural Network Based 13C NMR Prediction Using a...
 
QST Reference Data 2010 magerl pain
QST Reference Data 2010 magerl painQST Reference Data 2010 magerl pain
QST Reference Data 2010 magerl pain
 
TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...
TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...
TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...
 

More from University of Minnesota, Duluth

Muslims in Machine Learning workshop (NeurlPS 2021) - Automatically Identifyi...
Muslims in Machine Learning workshop (NeurlPS 2021) - Automatically Identifyi...Muslims in Machine Learning workshop (NeurlPS 2021) - Automatically Identifyi...
Muslims in Machine Learning workshop (NeurlPS 2021) - Automatically Identifyi...University of Minnesota, Duluth
 
Algorithmic Bias - What is it? Why should we care? What can we do about it?
Algorithmic Bias - What is it? Why should we care? What can we do about it? Algorithmic Bias - What is it? Why should we care? What can we do about it?
Algorithmic Bias - What is it? Why should we care? What can we do about it? University of Minnesota, Duluth
 
Algorithmic Bias : What is it? Why should we care? What can we do about it?
Algorithmic Bias : What is it? Why should we care? What can we do about it?Algorithmic Bias : What is it? Why should we care? What can we do about it?
Algorithmic Bias : What is it? Why should we care? What can we do about it?University of Minnesota, Duluth
 
Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection
Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection
Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection University of Minnesota, Duluth
 
Who's to say what's funny? A computer using Language Models and Deep Learning...
Who's to say what's funny? A computer using Language Models and Deep Learning...Who's to say what's funny? A computer using Language Models and Deep Learning...
Who's to say what's funny? A computer using Language Models and Deep Learning...University of Minnesota, Duluth
 
Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...
Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...
Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...University of Minnesota, Duluth
 
Puns upon a midnight dreary, lexical semantics for the weak and weary
Puns upon a midnight dreary, lexical semantics for the weak and wearyPuns upon a midnight dreary, lexical semantics for the weak and weary
Puns upon a midnight dreary, lexical semantics for the weak and wearyUniversity of Minnesota, Duluth
 
Duluth : Word Sense Discrimination in the Service of Lexicography
Duluth : Word Sense Discrimination in the Service of LexicographyDuluth : Word Sense Discrimination in the Service of Lexicography
Duluth : Word Sense Discrimination in the Service of LexicographyUniversity of Minnesota, Duluth
 
MICAI 2013 Tutorial Slides - Measuring the Similarity and Relatedness of Conc...
MICAI 2013 Tutorial Slides - Measuring the Similarity and Relatedness of Conc...MICAI 2013 Tutorial Slides - Measuring the Similarity and Relatedness of Conc...
MICAI 2013 Tutorial Slides - Measuring the Similarity and Relatedness of Conc...University of Minnesota, Duluth
 

More from University of Minnesota, Duluth (20)

Muslims in Machine Learning workshop (NeurlPS 2021) - Automatically Identifyi...
Muslims in Machine Learning workshop (NeurlPS 2021) - Automatically Identifyi...Muslims in Machine Learning workshop (NeurlPS 2021) - Automatically Identifyi...
Muslims in Machine Learning workshop (NeurlPS 2021) - Automatically Identifyi...
 
Automatically Identifying Islamophobia in Social Media
Automatically Identifying Islamophobia in Social MediaAutomatically Identifying Islamophobia in Social Media
Automatically Identifying Islamophobia in Social Media
 
What Makes Hate Speech : an interactive workshop
What Makes Hate Speech : an interactive workshopWhat Makes Hate Speech : an interactive workshop
What Makes Hate Speech : an interactive workshop
 
Algorithmic Bias - What is it? Why should we care? What can we do about it?
Algorithmic Bias - What is it? Why should we care? What can we do about it? Algorithmic Bias - What is it? Why should we care? What can we do about it?
Algorithmic Bias - What is it? Why should we care? What can we do about it?
 
Algorithmic Bias : What is it? Why should we care? What can we do about it?
Algorithmic Bias : What is it? Why should we care? What can we do about it?Algorithmic Bias : What is it? Why should we care? What can we do about it?
Algorithmic Bias : What is it? Why should we care? What can we do about it?
 
Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection
Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection
Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection
 
Who's to say what's funny? A computer using Language Models and Deep Learning...
Who's to say what's funny? A computer using Language Models and Deep Learning...Who's to say what's funny? A computer using Language Models and Deep Learning...
Who's to say what's funny? A computer using Language Models and Deep Learning...
 
Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...
Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...
Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...
 
Puns upon a midnight dreary, lexical semantics for the weak and weary
Puns upon a midnight dreary, lexical semantics for the weak and wearyPuns upon a midnight dreary, lexical semantics for the weak and weary
Puns upon a midnight dreary, lexical semantics for the weak and weary
 
Screening Twitter Users for Depression and PTSD
Screening Twitter Users for Depression and PTSDScreening Twitter Users for Depression and PTSD
Screening Twitter Users for Depression and PTSD
 
Duluth : Word Sense Discrimination in the Service of Lexicography
Duluth : Word Sense Discrimination in the Service of LexicographyDuluth : Word Sense Discrimination in the Service of Lexicography
Duluth : Word Sense Discrimination in the Service of Lexicography
 
Pedersen masters-thesis-oct-10-2014
Pedersen masters-thesis-oct-10-2014Pedersen masters-thesis-oct-10-2014
Pedersen masters-thesis-oct-10-2014
 
MICAI 2013 Tutorial Slides - Measuring the Similarity and Relatedness of Conc...
MICAI 2013 Tutorial Slides - Measuring the Similarity and Relatedness of Conc...MICAI 2013 Tutorial Slides - Measuring the Similarity and Relatedness of Conc...
MICAI 2013 Tutorial Slides - Measuring the Similarity and Relatedness of Conc...
 
Pedersen naacl-2013-demo-poster-may25
Pedersen naacl-2013-demo-poster-may25Pedersen naacl-2013-demo-poster-may25
Pedersen naacl-2013-demo-poster-may25
 
Pedersen semeval-2013-poster-may24
Pedersen semeval-2013-poster-may24Pedersen semeval-2013-poster-may24
Pedersen semeval-2013-poster-may24
 
Ihi2012 semantic-similarity-tutorial-part1
Ihi2012 semantic-similarity-tutorial-part1Ihi2012 semantic-similarity-tutorial-part1
Ihi2012 semantic-similarity-tutorial-part1
 
Pedersen acl2011-business-meeting
Pedersen acl2011-business-meetingPedersen acl2011-business-meeting
Pedersen acl2011-business-meeting
 
Acm ihi-2010-pedersen-final
Acm ihi-2010-pedersen-finalAcm ihi-2010-pedersen-final
Acm ihi-2010-pedersen-final
 
Pedersen naacl-2010-poster
Pedersen naacl-2010-posterPedersen naacl-2010-poster
Pedersen naacl-2010-poster
 
Aaai 2006 Pedersen
Aaai 2006 PedersenAaai 2006 Pedersen
Aaai 2006 Pedersen
 

Recently uploaded

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 

Recently uploaded (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 

Amia2009

  • 1. UMLS-Interface and UMLS-Similarity: Open Source Software for Measuring Paths and Semantic Similarity Bridget McInnes Ted Pedersen Serguei Pakhomov 1 11/17/2009
  • 2. Objective Develop tools to automatically compute the semantic similarity between two concepts in the biomedical domain using measures originally developed for general English using the Unified Medical Language System (UMLS) 2
  • 3. Motivation Clustering symptoms and disorders found in the text of clinical reports for post marking medication safety and surveillance Identification of patients for clinical studies Improving the sensitivity of document retrieval of scientific journals and clinical reports Development of terminologies and ontologies Clustering of biomedical documents Word sense disambiguation 3
  • 4. Unified Medical Language System Knowledge representation framework Contains 3 Main components: Metathesaurus Semantic Network SPECIALIST Lexicon 4
  • 5. Metathesaurus Semi-automatically integrates biomedical concepts from over a 100 controlled medical terminologies Source vocabularies are organized based on their Atomic Unique Identifiers Metathesaurus is organized based on their Concept Unique Identifier (CUI) 5
  • 6. Concept Unique Identifiers (CUIs) 6 CUI C0009264 Cold Temperature AUI A15588749 Cold Temperature AUI A3292554 Low Temperature MSH SNOMED-CT
  • 7. CUI Information The concepts (AUIs) from the source vocabularies may contain information about the concept such as its Definition Relation information between the concepts The information from the AUIs can be obtained through their respective CUIs 7
  • 8. Relations between CUIs in MSH 8 AUI A0123939 Temperature is-a AUI A15588749 Cold Temperature CUI C0039476 Temperature PAR/CHD (MSH) CUI C0009264 Cold Temperature MSH
  • 9. Relations between CUIs in SNOMED-CT 9 AUI A2887140 Temperature is-a AUI A3292554 Low Temperature CUI C0039476 Temperature PAR/CHD (SNOMED-CT) CUI C0009264 Cold Temperature SNOMED-CT
  • 10. Multiple Relations 10 PAR/CHD (SNOMED-CT) CUI C0039476 Temperature CUI C0009264 Cold Temperature CUI C0039476 Temperature PAR/CHD (MSH) CUI C0009264 Cold Temperature
  • 11. Relation Information 11 MRHIER MRREL AUI A0123939 Temperature CUI C0039476 Temperature Relation Relation AUI A15588749 Cold Temperature CUI C0009264 Cold Temperature
  • 12. MRREL and MRHIER MRHIER Contains the full path to root relations between AUIs from each of the sources is-a part-of MRREL Contains the pairwise relations between CUIs Relations: PAR/CHD RB/RN It is possible to generate MRHEIR from MRREL except for the following sources: AIR MSH SNM2 USPMG OMS 12
  • 13. CUI versus AUI Hierarchy The benefit of using CUIs Ability to obtain the relation information between concepts across sources Ability to obtain the relation information between concepts using more than one type of relation: PAR/CHD – parent/child (relation in MRHIER) RB/RN – narrower/broader SIB – sibling RL – concepts are similar or ‘alike’ The benefit of using AUIs Ability to obtain relation information (PAR/CHD) between concepts in the same source very quickly incorporates tree positional information for sources such as MSH UMLS-Query by Shah and Musen, 2008 13
  • 14. UMLS-Interface Perl interface to the UMLS present locally in a MySQL database. Its main purpose is to returns path information about CUIs using the relation information in MRREL All possible paths to the root Shortest path between two concepts 14
  • 15. UMLS-Similarity A suite of perl modules that implement a number of path-based semantic similarity measures to determine the similarity between two CUIs in the UMLS Measures are path-based because they rely on the location of the concepts in a hierarchy The path information is obtained using UMLS-Interface Semantic Similarity Measures: Path measure Conceptual Distance (Rada, et. al, 1989) Leacock and Chodorow, 1998 Wu and Palmer, 1994 Nguyen and Al-Mubaid, 2006 15
  • 16. Semantic Similarity Example Path measure 16 1 Sim(c1,c2) = N where N = # links in the shortest path between the two concepts c1 and c2
  • 17. Similarity Given Specified Sources 17 C0005898 Body Regions PAR (SNOMED-CT) C0229962 Anatomic Part PAR (MSH) C0015385 Limbs PAR (SNOMED-CT)
  • 18. Similarity Given Specified Sources 18 C0005898 Body Regions PAR (SNOMED-CT) C0229962 Anatomic Part Similarity = 1/1 PAR (MSH) C0015385 Limbs PAR (SNOMED-CT)
  • 19. Similarity Given Specified Sources 19 C0005898 Body Regions PAR (SNOMED-CT) Similarity = 1/2 C0229962 Anatomic Part PAR (MSH) C0015385 Limbs PAR (SNOMED-CT)
  • 20. Similarity Given Specified Sources 20 C0005898 Body Regions PAR (SNOMED-CT) Similarity = 1/1 C0229962 Anatomic Part PAR (MSH) C0015385 Limbs PAR (SNOMED-CT)
  • 21. Similarity Given Specified Relations 21 C0005898 Body Regions RB (MSH) C0229962 Anatomic Part PAR (MSH) C0015385 Limbs RB (MSH)
  • 22. Similarity Given Specified Relations 22 C0005898 Body Regions RB (MSH) C0229962 Anatomic Part Similarity = 1/1 PAR (MSH) C0015385 Limbs RB (MSH)
  • 23. Similarity Given Specified Relations 23 C0005898 Body Regions Similarity = 1/2 RB (MSH) C0229962 Anatomic Part PAR (MSH) C0015385 Limbs RB (MSH)
  • 24. Similarity Given Specified Relations 24 C0005898 Body Regions Similarity = 1/1 RB (MSH) C0229962 Anatomic Part PAR (MSH) C0015385 Limbs RB (MSH)
  • 25. Functional Validation Comparison with Previous Work: Pedersen, et al. 2007 Nguyen and Al-Mubaid, 2006 Caviedes and Cimino, 2004 25
  • 26. Pedersen, et al. Semantic Similarity Measures Path Leacock and Chodorow, 1998 Source SNOMEDCT Data 29 medical terms pairs Similarity determined by: 9 Medical Coders 3 Physicians 4 Point Scale 4 – practically synonymous 3 – related 2 – marginally related 1 - unrelated Spearman’s Rank Correlation Coefficient 26
  • 27. Comparison with Pedersen, et al. Semantic Similarity Measures Path Leacock and Chodorow, 1998 Source: SNOMED-CT from UMLS 2008AB Relations: PAR/CHD Comparison with human annotations Spearman Rank Correlation Coefficient 27
  • 30. Nguyen and Al-Mubaid Semantic Similarity Measures Nguyen and Al-Mubaid, 2006 Leacock and Chodorow, 1998 Wu and Palmer, 1994 Path Source: MSH Same Dataset created by Pedersen, et al. Data 29 medical terms pairs Similarity determined by: 9 Medical Coders 3 Physicians Spearman’s Rank Correlation Coefficient 30
  • 31. Comparison with Nguyen and Al-Mubaid Semantic Similarity Measures Nguyen and Al-Mubaid, 2006 Leacock and Chodorow, 1998 Wu and Palmer, 1994 Path Source: MSH from UMLS 2008AB Relations: PAR/CHD Comparison with human annotations Spearman Rank Correlation Coefficient 31
  • 32. Comparison with Nguyen and Al-Mubaid 32
  • 33. Caviedes and Cimino Semantic Similarity Measure Conceptual Distance – Rada, et al. Source: MSH Relations: PAR/CHD Data 10 medical terms pairs using following CUIs Digestive system disease: C0012242 Peptic esophagitis: C0014869 Psychotherapy: C0033968 Thirst: C0039971 Thoracic duct: C0039979 33
  • 34. Comparison with Caviedes and Cimino Semantic Similarity Measures Conceptual Distance Originally proposed by Rada, et. al., 1989 Source: MSH from UMLS 2008AB Relations: PAR/CHD Comparison between the Conceptual Distance Scores 34
  • 35. Comparison with Caviedes and Cimino 35
  • 36. Comparison with Caviedes and Cimino 36
  • 37. Results The results show that UMLS-Similarity can be used to reproduce the results reported by: Pedersen, et al. Caviedes and Cimino 37
  • 38. Results The correlation results obtained by UMLS-Similarity and reported by Nguyen and Al-Mubaid vary Different versions of MSH were used to conduct the experiment Possibly different mappings of the terms to CUIs in MSH were used Information used by Nguyen and Al-Mubaid comes directly from MSH which is located in MRHEIR and as PAR/CHD relations in MRREL It is not possible to generate MRHIER from MRREL because the full path-to-root is a transitive closure of the pairwise PAR/CHD relations which does not hold true for MSH because a MSH concept may have different children depending on its tree position 38
  • 39. Conclusions UMLS-Similarity Used to determine the similarity between two concepts given a specified set of sources and relations Contains the following similarity measures Path measure Conceptual Distance proposed Rada, et. al. 1989 Leacock and Chodorow, 1998 Wu and Palmer, 1994 Nguyen and Al-Mubaid, 2006 UMLS-Interface Used to obtain path information about a CUI given a specified set of sources and relations 39
  • 40. Future Work UMLS-Interface Improve the efficiency in which the path information is stored UMLS-Similarity Information Content Similarity Measures Resnik, 1995 Jiang and Conrath, 1997 Lin, 1997 Relatedness Measures Patwardhan, 2003 40
  • 41. Take Home Message #1 41 UMLS-Interface can used to extract path information about a concept given a specified set of sources and relations.
  • 42. Take Home Message #2 42 UMLS-Similarity can be used to compute the semantic similarity between two concepts given a specified set of sources and relations.
  • 43. UMLS-Interface http://search.cpan.org/dist/UMLS-Interface UMLS-Similarity http://search.cpan.org/dist/UMLS-Similarity 43 Availability
  • 44. Thank You We would like to thank Kin Wah Fung, Olivier Bodenreider, Jan Willis Lan Aronson The research was supported in parts by: Fellowships: NLM Research Participation Program GAANN fellowship from US Dept. of Ed. Grants IR01LM009623-01A2 from NIH, NLM 44

Editor's Notes

  1. The objective of this work was to develop tools to automatically computer the semantic similarity between two concepts in the biomedical domain using measures originally developed for general EnglishWe obtain these biomedical concepts from the Unified Medical Language System UMLS
  2. The Unified Medical Language System (UMLS) which is a knowledge representation framework consisting of three main components: 1. The Metathesaurus2. The Semantic Network3. The Specialist LexiconOur focus with respect to these packages is the Metathesaurus
  3. The Metathesaurus integrates biomedical concepts from over 100 controlled medical terminologies into a single sourceThe concepts from the terminologies or often called source vocabularies are referred as Atomic Unique Identifiers (AUIs). These AUIs are semi-automatically clustered into Concept Unique Identifiers CUIs
  4. For example, two source vocabularies in the Metathesaurus are MSH and SNOMEDCT In MSH and SNOMEDCT there exists a concept for “Cold Temperature” these AUIs are combined to form a CUI for the concept Cold Temperature
  5. These concepts from the source vocabularies may be encoded with information about the concept such as its definition or relationship with other conceptsThis information encoded at the AUI level may be obtained through their respective CUIs
  6. For example, in MSH there exists an is-a relationship between the concepts Cold Temperature and TemperatureThis is carried over as a PAR/CHD between the CUIs Cold Temperature and TemperatureThe Metathesaurus keeps track of where the relationships originated
  7. So for example,If a relationship existed between Low Temperature and Temperature in SNOMED CT – this also would be carried to the CUI level
  8. Creating two entries in the Metathesaurus for the same pair of CUIs – one for each source.
  9. This relation information is stored in two tables in the Metathesaurus. MRHIER which stores the relations between AUIs in their respective sourcesAnd MRREL which stores the relations between CUIs.
  10. MRHIER contains the full path to root relations between the AUI for each of the different sourcesMRREL contains the pairwise relations between the CUIs derived from these sources. These relations are labeled PAR/CHD relations in MRRELMRREL contains other types of relations - the two considered hierarchical in the literature are the PAR/CHD relations and the RB/RN (broader or narrower)It is possible to generate MRHIER from MRREL through the PAR/CHD relations for most sources although this is not the case for all of them. For example, MSH because a MSH concept may have different children depending on their location within the hierarchy.NOTES: don’t readthe full path-to-root is a transitive closure of the pairwise PAR/CHD relations which does not hold true for MSH because a MSH concept may have different children depending on their location within the hierarchy
  11. There are benefits to using either the AUI hierarchy in MRHIER and the CUI hierarchy in MRREL The benefit of using CUIs is that ability to obtain the path information between in concepts in different sources and over relations outside of just the PAR/CHD relation But the benefit of using AUIS is that the PAR/CHD path information between concepts incorporates the tree positional information for the sources that use this information such as MSH.Currently, there is one package that we know of that obtains path information using MRHIER which is UMLS-Query by Shah and Musen
  12. The UMLS-Interface package is a perl interface to the UMLS which is present locally in a MySQL databaseIts main purpose is to return path information about CUIs using information from MRREL such as all of the possible paths from the CUI to the root given a specified set of sources and relations
  13. UMLS-Similarity package is a Perl package that allows you to take the semantic similarity between two CUIs given a specified set of sources and relationsThe measure currently implemented in UMLS-Similarity are the path based measures proposed by Leacock and Chodorow Wu and Palmer Nguyen and Al-MubaidAnd the conceptual distance measure proposed by Rada, at. Al. and the path measureThese measure are called path-based measures because they rely on the location of the concepts in a hierarchy – the path information is obtained from the UMLS-Interface package which returns the path information given a set of specified sources and relations
  14. In the next couple slides I am going to show and example using the path measure which is one over the number of nodes in the shortest path between two concepts
  15. So in this example would like to calculate the similarity between Limbs and Body Region using the path measure given a set of specified sourcesUsing MSH, the similarity is going to be one over one since there is only one link between the two concepts although when using SNOMEDCT the similarity is ½ since there are two links between the two concepts. If we combine the MSH and SNOMEDCT sources the similarity is going to be one over the number of links in the shortest path between the two concept which would be one.
  16. So in this example would like to calculate the similarity between Limbs and Body Region using the path measure given a set of specified sourcesUsing MSH, the similarity is going to be one over one since there is only one link between the two concepts although when using SNOMEDCT the similarity is ½ since there are two links between the two concepts. If we combine the MSH and SNOMEDCT sources the similarity is going to be one over the number of links in the shortest path between the two concept which would be one.
  17. So in this example would like to calculate the similarity between Limbs and Body Region using the path measure given a set of specified sourcesUsing MSH, the similarity is going to be one over one since there is only one link between the two concepts although when using SNOMEDCT the similarity is ½ since there are two links between the two concepts. If we combine the MSH and SNOMEDCT sources the similarity is going to be one over the number of links in the shortest path between the two concept which would be one.
  18. So in this example would like to calculate the similarity between Limbs and Body Region using the path measure given a set of specified sourcesUsing MSH, the similarity is going to be one over one since there is only one link between the two concepts although when using SNOMEDCT the similarity is ½ since there are two links between the two concepts. If we combine the MSH and SNOMEDCT sources the similarity is going to be one over the number of links in the shortest path between the two concept which would be one.
  19. And in this example we would like to calculate the similarity between Limbs and Body Region using the path measure given a set of specified sourcesUsing PAR/CHD relations, the similarity is going to be one over one since there is only one link between the two concepts although when using the RB/RN the similarity is ½ since there are two links between the two concepts. If we combine the PAR/CHD and RB/RN relations the similarity is going to be one over the number of links in the shortest path between the two concept which would be one.
  20. And in this example we would like to calculate the similarity between Limbs and Body Region using the path measure given a set of specified sourcesUsing PAR/CHD relations, the similarity is going to be one over one since there is only one link between the two concepts although when using the RB/RN the similarity is ½ since there are two links between the two concepts. If we combine the PAR/CHD and RB/RN relations the similarity is going to be one over the number of links in the shortest path between the two concept which would be one.
  21. And in this example we would like to calculate the similarity between Limbs and Body Region using the path measure given a set of specified sourcesUsing PAR/CHD relations, the similarity is going to be one over one since there is only one link between the two concepts although when using the RB/RN the similarity is ½ since there are two links between the two concepts. If we combine the PAR/CHD and RB/RN relations the similarity is going to be one over the number of links in the shortest path between the two concept which would be one.
  22. And in this example we would like to calculate the similarity between Limbs and Body Region using the path measure given a set of specified sourcesUsing PAR/CHD relations, the similarity is going to be one over one since there is only one link between the two concepts although when using the RB/RN the similarity is ½ since there are two links between the two concepts. If we combine the PAR/CHD and RB/RN relations the similarity is going to be one over the number of links in the shortest path between the two concept which would be one.
  23. We compare the results obtained by UMLS-Similarity to those reported by Pedersen, et. Al. Nguyen and Al-Mubaid and Caviedes and Cimino
  24. Pedersen evaluate the semantic similarity measure proposed by Leacock and Chodorow and the path measure using SNOMEDCT prior to its inclusion in the UMLS Their dataset consists of 29 medical term pairs whose similarity was determined by 9 Medical Coders and 3 Physicians using a 4 point scale where four indicates that the terms are practically synonymous and 1 indicates that they are unrelatedThe authors evaluate the measures by determining the correlation between the human annotations and the measures using Spearman’s rank correlation coefficient
  25. We compare the correlation results obtained by UMLS-Similarity on the same dataset using SNOMEDCT and the PAR/CHD relations in the 2008AB version of the UMLS
  26. The results show that the correlationobtained by UMLS-Similarity and the human annotators when using the path measure is only one percentage point lower than those obtained by Pedersen, et. Al. and the results obtained using the measure proposed by Leacock and Chodorow are identical to those obtained by Pedersen, et. al.
  27. The results show that the correlationobtained by UMLS-Similarity and the human annotators when using the path measure is only one percentage point lower than those obtained by Pedersen, et. Al. and the results obtained using the measure proposed by Leacock and Chodorow are identical to those obtained by Pedersen, et. al.
  28. Nguyen (Wing) and Al-Mubaid evaluate the semantic similarity methods proposed by Leacock and Chodorow, and Wu and Palmer, their own proposed measure and the path measure using the Medical Subject Headings MSH – the actual MSH source not the source integrated in the UMLS. They use the same dataset as Pedersen and evaluate the measure by determining the correlation between the human annotations using Spearman’s Rank Correlation Coefficient
  29. We compare the correlation results obtained by UMLS Similarity using MSH and the PAR/CHD relations in the 2008AB version of the UMLS
  30. The results show that UMLS-Similarity obtains a lower correlation than those reported by Nguyen and Al-Mubaid for each of the semantic similarity measures.
  31. Lastly we compare the results of the UMLS-Similarity package with those obtained by Caviedes and CiminoCaviedes and Cimino evaluate the Conceptual distance measure proposed by Rada, et. Al. on 10 medical terms pairs
  32. Caviedes and Cimino use the PAR/CHD relations in MSH to determine the Conceptual Distance for each pair of terms. We compare the actual conceptual distance score obtained by Caviedes and Cimino with those obtained by UMLS-Similarity using MSH and the PAR/CHD relations from the UMLS version 2008AB.
  33. The results show that the UMLS-Similarity obtains the same conceptual distance as Caviedes and Cimino for six out of the ten pairs. The remaining four pairs are only off by one point.
  34. The results show that the UMLS-Similarity obtains the same conceptual distance as Caviedes and Cimino for six out of the ten pairs. The remaining four pairs are only off by one point.
  35. The results show that UMLS-Similarity can be used to reproduce the results reported by Pedersen et al and Caviedes and CiminoWe believe that the minor differences in results are due to different versions of the source vocabularies that were used
  36. We believe the difference in the results is due to :Different versions of MSH being used to conduct the experimentPossibly different mappings of the terms to CUIs in MSHThe path information used by Nguyen (Wing) and Al-Mubaid to determine the semantic similarity comes directly from MSH which in the UMLS would be located in the MRHEIR and the PAR/CHD relations in MRRELBut with MSH, the exact path-to-root information for each of the concepts can not be generated in MRREL because a MSH concept may have different children depending on its position in the hierarchy
  37. So In conclusion the UMLS-Interface can be used to obain the path information about a CUI given a specified set of sources and relationsUMLS-Similarity can be used to determine the semantic similarity between two CUIs in the UMLSCurrently, there are five path-based semantic similarity measures included in the package: Conceptual Distance proposed by Rada, et. Al.The measures proposed by Leacock and Chodorow, Wu and Palmer and Nguyen and Al-Mubaid and the path measure
  38. In the future, For UMLS-Interface, we plan to improve the efficiency in which the path information of a CUI is stored And For UMLS-Similarity, we plan to incorporate other semantic similarity and relatedness measures such as the information-content semantic similarity measures proposed by Resnik, Jiang and Conrath and LinAnd the semantic relatedness measure proposed by Patwardhan.
  39. So, this wraps up my presentation about UMLS-Interface and UMLS-Similarity. I have two take home messages that I hope that you remember from this talk. The first is that UMLS-Interface can reliably be used to extract path information about a CUI given a specified set of sources and relations
  40. The second is that UMLS-Similarity can reliably be used to determine the semantic similarity between two CUIs
  41. UMLS-Interface and UMLS-Similarity can both be downloaded from CPAN. I do have a demo set up on my laptop so if you are interested please let me know and I can show you during the break.Thank you - are there any questions