SlideShare a Scribd company logo
1 of 93
Biomedical text mining

Lars Juhl Jensen
exponential growth
~45 seconds per paper
information retrieval
named entity recognition
augmented browsing
text corpora
information extraction
information retrieval
find the relevant papers
ad hoc retrieval
user-specified query
“yeast AND cell cycle”
PubMed
indexing
fast lookup
stemming
word endings
dynamic query expansion
MeSH terms
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1
homolog) directly phosphorylated Swe1
and this modification served as a priming
step to promote subsequent Cdc5dependent Swe1 hyperphosphorylation
and degradation
no tool will find that
named entity recognition
computer
as smart as a dog
teach it specific tricks
identify the concepts
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1
homolog) directly phosphorylated Swe1
and this modification served as a priming
step to promote subsequent Cdc5dependent Swe1 hyperphosphorylation
and degradation
comprehensive lexicon
proteins
chemicals
compartments
tissues
diseases
organisms
CDC2
cyclin dependent kinase 1
orthographic variation
upper- and lower-case
CDC2
Cdc2
spaces and hyphens
cyclin dependent kinase 1
cyclin-dependent kinase 1
flexible matching
similar to alignment
prefixes and postfixes
CDC2
hCDC2
plurals
adjectives
“black list”
SDS
efficient tagger
Pafilis et al., PLOS ONE, 2013
text corpora
>10 km
<10 hours
most use Medline
~22 million abstracts
few use full-text articles
no access
PDF files
layout-aware extraction
millions of full-text articles
information extraction
formalize the facts
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1
homolog) directly phosphorylated Swe1
and this modification served as a priming
step to promote subsequent Cdc5dependent Swe1 hyperphosphorylation
and degradation
two approaches
co-mentioning
counting
within documents
within paragraphs
within sentences
co-mentioning score
NLP
Natural Language Processing
grammatical analysis
part-of-speech tagging
multiword detection
semantic tagging
sentence parsing
Gene and protein names
Cue words for entity
recognition
Verbs for relation extraction
[nxexpr The expression of
[nxgene the cytochrome
genes
[nxpg CYC1 and CYC7]]]
is controlled by
[nxpg HAP1]
extract stated facts
high precision
poor recall
questions?
Exercise 3

Go to http://diseases.jensenlab.org

Find TYMS disease associations

nspect the text-mining evidence

Look for examples of synonym usage

Find genes linked to colorectal cance
thank you!

More Related Content

What's hot

Exploring proteins, chemicals and their interactions with STRING and STITCH
Exploring proteins, chemicals and their interactions with STRING and STITCHExploring proteins, chemicals and their interactions with STRING and STITCH
Exploring proteins, chemicals and their interactions with STRING and STITCH
biocs
 

What's hot (13)

Open access - making the most of biomedical literature mining
Open access - making the most of biomedical literature miningOpen access - making the most of biomedical literature mining
Open access - making the most of biomedical literature mining
 
Computational approaches to cell cycle analysis: Current research topics (tho...
Computational approaches to cell cycle analysis: Current research topics (tho...Computational approaches to cell cycle analysis: Current research topics (tho...
Computational approaches to cell cycle analysis: Current research topics (tho...
 
Applied text mining
Applied text miningApplied text mining
Applied text mining
 
Applied text mining
Applied text miningApplied text mining
Applied text mining
 
Exploring proteins, chemicals and their interactions with STRING and STITCH
Exploring proteins, chemicals and their interactions with STRING and STITCHExploring proteins, chemicals and their interactions with STRING and STITCH
Exploring proteins, chemicals and their interactions with STRING and STITCH
 
Biological literature mining - from information retrieval to biological disco...
Biological literature mining - from information retrieval to biological disco...Biological literature mining - from information retrieval to biological disco...
Biological literature mining - from information retrieval to biological disco...
 
Open access - making the most of biomedical literature mining
Open access - making the most of biomedical literature miningOpen access - making the most of biomedical literature mining
Open access - making the most of biomedical literature mining
 
Cell cycle
Cell cycleCell cycle
Cell cycle
 
Integration of biomedical literature and databases
Integration of biomedical literature and databasesIntegration of biomedical literature and databases
Integration of biomedical literature and databases
 
Integration of biomedical literature and databases
Integration of biomedical literature and databasesIntegration of biomedical literature and databases
Integration of biomedical literature and databases
 
Open biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen scienceOpen biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen science
 
Mining Drug Targets, Structures and Activity Data
Mining Drug Targets, Structures and Activity DataMining Drug Targets, Structures and Activity Data
Mining Drug Targets, Structures and Activity Data
 
Citizen Science and Rare Disease Research
Citizen Science and Rare Disease ResearchCitizen Science and Rare Disease Research
Citizen Science and Rare Disease Research
 

Similar to Biomedical text mining

Text mining and data integration
Text mining and data integrationText mining and data integration
Text mining and data integration
Lars Juhl Jensen
 
Systems biology - Understanding biology at the systems level
Systems biology - Understanding biology at the systems levelSystems biology - Understanding biology at the systems level
Systems biology - Understanding biology at the systems level
Lars Juhl Jensen
 
Systems biology - Bioinformatics on complete biological systems
Systems biology - Bioinformatics on complete biological systemsSystems biology - Bioinformatics on complete biological systems
Systems biology - Bioinformatics on complete biological systems
Lars Juhl Jensen
 

Similar to Biomedical text mining (14)

Text mining and data integration
Text mining and data integrationText mining and data integration
Text mining and data integration
 
Text mining
Text miningText mining
Text mining
 
Mining the biomedical literature: Dictionary-based identification of proteins...
Mining the biomedical literature: Dictionary-based identification of proteins...Mining the biomedical literature: Dictionary-based identification of proteins...
Mining the biomedical literature: Dictionary-based identification of proteins...
 
Text mining for protein and small molecule relations
Text mining for protein and small molecule relationsText mining for protein and small molecule relations
Text mining for protein and small molecule relations
 
Systems biology - Understanding biology at the systems level
Systems biology - Understanding biology at the systems levelSystems biology - Understanding biology at the systems level
Systems biology - Understanding biology at the systems level
 
Systems biology - Bioinformatics on complete biological systems
Systems biology - Bioinformatics on complete biological systemsSystems biology - Bioinformatics on complete biological systems
Systems biology - Bioinformatics on complete biological systems
 
STRING: Large-scale data and text mining
STRING: Large-scale data and text miningSTRING: Large-scale data and text mining
STRING: Large-scale data and text mining
 
Text mining exercise
Text mining exerciseText mining exercise
Text mining exercise
 
Network biology: Large-scale data and text mining
Network biology: Large-scale data and text miningNetwork biology: Large-scale data and text mining
Network biology: Large-scale data and text mining
 
Dictionary-based named entity recognition
Dictionary-based named entity recognitionDictionary-based named entity recognition
Dictionary-based named entity recognition
 
Text-mining practical
Text-mining practicalText-mining practical
Text-mining practical
 
Biomedical text mining: Automatic processing of unstructured text
Biomedical text mining: Automatic processing of unstructured textBiomedical text mining: Automatic processing of unstructured text
Biomedical text mining: Automatic processing of unstructured text
 
Dr. ed cooper kcnq2 Cure summit parent track learn more at kcnq2cure.org
Dr. ed cooper kcnq2 Cure summit parent track learn more at kcnq2cure.orgDr. ed cooper kcnq2 Cure summit parent track learn more at kcnq2cure.org
Dr. ed cooper kcnq2 Cure summit parent track learn more at kcnq2cure.org
 
IPK abstract
IPK abstractIPK abstract
IPK abstract
 

More from Lars Juhl Jensen

More from Lars Juhl Jensen (20)

One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Illustrating the power of dictionary-based named entit...One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Illustrating the power of dictionary-based named entit...
 
One tagger, many uses: Simple text-mining strategies for biomedicine
One tagger, many uses: Simple text-mining strategies for biomedicineOne tagger, many uses: Simple text-mining strategies for biomedicine
One tagger, many uses: Simple text-mining strategies for biomedicine
 
Extract 2.0: Text-mining-assisted interactive annotation
Extract 2.0: Text-mining-assisted interactive annotationExtract 2.0: Text-mining-assisted interactive annotation
Extract 2.0: Text-mining-assisted interactive annotation
 
Network visualization: A crash course on using Cytoscape
Network visualization: A crash course on using CytoscapeNetwork visualization: A crash course on using Cytoscape
Network visualization: A crash course on using Cytoscape
 
STRING & STITCH : Network integration of heterogeneous data
STRING & STITCH: Network integration of heterogeneous dataSTRING & STITCH: Network integration of heterogeneous data
STRING & STITCH : Network integration of heterogeneous data
 
Medical network analysis: Linking diseases and genes through data and text mi...
Medical network analysis: Linking diseases and genes through data and text mi...Medical network analysis: Linking diseases and genes through data and text mi...
Medical network analysis: Linking diseases and genes through data and text mi...
 
Network Biology: A crash course on STRING and Cytoscape
Network Biology: A crash course on STRING and CytoscapeNetwork Biology: A crash course on STRING and Cytoscape
Network Biology: A crash course on STRING and Cytoscape
 
Cellular networks
Cellular networksCellular networks
Cellular networks
 
Cellular Network Biology: Large-scale integration of data and text
Cellular Network Biology: Large-scale integration of data and textCellular Network Biology: Large-scale integration of data and text
Cellular Network Biology: Large-scale integration of data and text
 
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
 
STRING & related databases: Large-scale integration of heterogeneous data
STRING & related databases: Large-scale integration of heterogeneous dataSTRING & related databases: Large-scale integration of heterogeneous data
STRING & related databases: Large-scale integration of heterogeneous data
 
Tagger: Rapid dictionary-based named entity recognition
Tagger: Rapid dictionary-based named entity recognitionTagger: Rapid dictionary-based named entity recognition
Tagger: Rapid dictionary-based named entity recognition
 
Network Biology: Large-scale integration of data and text
Network Biology: Large-scale integration of data and textNetwork Biology: Large-scale integration of data and text
Network Biology: Large-scale integration of data and text
 
Medical text mining: Linking diseases, drugs, and adverse reactions
Medical text mining: Linking diseases, drugs, and adverse reactionsMedical text mining: Linking diseases, drugs, and adverse reactions
Medical text mining: Linking diseases, drugs, and adverse reactions
 
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and textNetwork biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
 
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactionsMedical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactions
 
Cellular Network Biology
Cellular Network BiologyCellular Network Biology
Cellular Network Biology
 
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and textNetwork biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
 
Biomarker bioinformatics: Network-based candidate prioritization
Biomarker bioinformatics: Network-based candidate prioritizationBiomarker bioinformatics: Network-based candidate prioritization
Biomarker bioinformatics: Network-based candidate prioritization
 
The Art of Counting: Scoring and ranking co-occurrences in literature
The Art of Counting: Scoring and ranking co-occurrences in literatureThe Art of Counting: Scoring and ranking co-occurrences in literature
The Art of Counting: Scoring and ranking co-occurrences in literature
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Biomedical text mining