SlideShare a Scribd company logo
1 of 100
The STRING database Quality scores for heterogeneous interaction data Lars Juhl Jensen EMBL Heidelberg
data integration
Jensen et al., Drug Discovery Today: Targets, 2004
functional interactions
Genomic neighborhood Species co-occurrence Gene fusions Database imports Experimental interaction data Microarray expression data Literature mining
373 proteomes
model organism databases
Ensembl
Genome Reviews
RefSeq
genomic context methods
gene fusion
 
gene neighborhood
 
phylogenetic profiles
 
scoring schemes
benchmarking
cross-species transfer
primary experimental data
many sources
different formats
different gene identifiers
redundancy
physical protein interactions
IntAct
BIND Biomolecular Interaction Network Database
MINT Molecular Interactions Database
DIP Database of Interacting Proteins
GRID General Repository for Interaction Datasets
HPRD Human Protein Reference Database
PSI-MI
reference proteomes
merge data by publication
thousands of interactions
correct interactions
wrong interactions
scoring scheme
complex pull-down
von Mering et al., Nucleic Acids Research, 2005
log[(N 12 · N)/((N 1 +1) · (N 2 +1))]
yeast two-hybrid
non-shared interactors
-log((N 1 +1) · (N 2 +1))
not directly comparable
calibrate vs. gold standard
 
other types of evidence
co-expression
GEO Gene Expression Omnibus
species-specific datasets
correlation coefficient
calibrate vs. gold standard
directly comparable
curated knowledge
many sources
different formats
different gene identifiers
redundancy
protein complexes
MIPS Munich Information center for Protein Sequences
Gene Ontology
pathway databases
KEGG Kyoto Encyclopedia of Genes and Genomes
Reactome
PID NCI-Nature Pathway Interaction Database
STKE Signal Transduction Knowledge Environment
BioPAX
reference proteomes
literature mining
M EDLINE
SGD Saccharomyces Genome Database
The Interactive Fly
OMIM Online Mendelian Inheritance in Man
different gene identifiers
synonyms lists
black list
flexible matching
co-occurrence
log[(N 12 · N)/((N 1 +1) · (N 2 +1))]
NLP Natural Language Processing
[object Object],[object Object],[object Object],[object Object],[object Object]
calibrate vs. gold standard
directly comparable
combine all evidence
spread over many species
transfer by orthology
von Mering et al., Nucleic Acids Research, 2005
two modes
 
orthologous groups
von Mering et al., Nucleic Acids Research, 2005
 
fuzzy orthology
von Mering et al., Nucleic Acids Research, 2005
add probabilistic scores
P = 1-(1-P 1 ) . (1-P 2 ) . (1-P 3 ) …
Genomic neighborhood Species co-occurrence Gene fusions Database imports Experimental interaction data Microarray expression data Literature mining
Acknowledgments ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

More Related Content

What's hot

Systems biology: Bioinformatics on complete biological system
Systems biology: Bioinformatics on complete biological systemSystems biology: Bioinformatics on complete biological system
Systems biology: Bioinformatics on complete biological system
Lars Juhl Jensen
 
Systems biology - Understanding biology at the systems level
Systems biology - Understanding biology at the systems levelSystems biology - Understanding biology at the systems level
Systems biology - Understanding biology at the systems level
Lars Juhl Jensen
 
Making gene networks through data integration
Making gene networks through data integrationMaking gene networks through data integration
Making gene networks through data integration
Lars Juhl Jensen
 

What's hot (20)

Introduction to STRING
Introduction to STRINGIntroduction to STRING
Introduction to STRING
 
Large-scale integration of data and text
Large-scale integration of data and textLarge-scale integration of data and text
Large-scale integration of data and text
 
Gene association networks - Large-scale integration of data and text
Gene association networks - Large-scale integration of data and textGene association networks - Large-scale integration of data and text
Gene association networks - Large-scale integration of data and text
 
Network biology - Large-scale integration of data and text
Network biology - Large-scale integration of data and textNetwork biology - Large-scale integration of data and text
Network biology - Large-scale integration of data and text
 
STRING & related databases: Large-scale integration of heterogeneous data
STRING & related databases: Large-scale integration of heterogeneous dataSTRING & related databases: Large-scale integration of heterogeneous data
STRING & related databases: Large-scale integration of heterogeneous data
 
STRING - Large-scale integration of data and text
STRING - Large-scale integration of data and textSTRING - Large-scale integration of data and text
STRING - Large-scale integration of data and text
 
Gene association networks - Large-scale integration of data and text
Gene association networks - Large-scale integration of data and textGene association networks - Large-scale integration of data and text
Gene association networks - Large-scale integration of data and text
 
Cross-species data integration
Cross-species data integrationCross-species data integration
Cross-species data integration
 
STRING: Protein association networks
STRING: Protein association networksSTRING: Protein association networks
STRING: Protein association networks
 
STRING: protein association networks
STRING: protein association networksSTRING: protein association networks
STRING: protein association networks
 
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and textNetwork biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
 
Gene association networks - Large-scale integration of data and text
Gene association networks - Large-scale integration of data and textGene association networks - Large-scale integration of data and text
Gene association networks - Large-scale integration of data and text
 
STRING - Modeling of biological systems through cross-species data integ...
STRING - Modeling of biological systems through cross-species data integ...STRING - Modeling of biological systems through cross-species data integ...
STRING - Modeling of biological systems through cross-species data integ...
 
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and textNetwork biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
 
Systems biology: Bioinformatics on complete biological system
Systems biology: Bioinformatics on complete biological systemSystems biology: Bioinformatics on complete biological system
Systems biology: Bioinformatics on complete biological system
 
Systems biology: Bioinformatics on complete biological systems
Systems biology: Bioinformatics on complete biological systemsSystems biology: Bioinformatics on complete biological systems
Systems biology: Bioinformatics on complete biological systems
 
Integration of heterogeneous data
Integration of heterogeneous dataIntegration of heterogeneous data
Integration of heterogeneous data
 
Systems biology - Understanding biology at the systems level
Systems biology - Understanding biology at the systems levelSystems biology - Understanding biology at the systems level
Systems biology - Understanding biology at the systems level
 
Advanced bioinformatics of proteomics datasets
Advanced bioinformaticsof proteomics datasetsAdvanced bioinformaticsof proteomics datasets
Advanced bioinformatics of proteomics datasets
 
Making gene networks through data integration
Making gene networks through data integrationMaking gene networks through data integration
Making gene networks through data integration
 

Similar to The STRING database - Quality scores for heterogeneous interaction data

Large-scale integration of data and text
Large-scale integration of data and textLarge-scale integration of data and text
Large-scale integration of data and text
Lars Juhl Jensen
 
Large-scale data and text mining
Large-scale data and text miningLarge-scale data and text mining
Large-scale data and text mining
Lars Juhl Jensen
 
Exploring proteins, chemicals and their interactions with STRING and STITCH
Exploring proteins, chemicals and their interactions with STRING and STITCHExploring proteins, chemicals and their interactions with STRING and STITCH
Exploring proteins, chemicals and their interactions with STRING and STITCH
biocs
 

Similar to The STRING database - Quality scores for heterogeneous interaction data (17)

Functional association networks - The STRING and STITCH web resources
Functional association networks - The STRING and STITCH web resourcesFunctional association networks - The STRING and STITCH web resources
Functional association networks - The STRING and STITCH web resources
 
Prediction of protein networks through data integration
Prediction of protein networks through data integrationPrediction of protein networks through data integration
Prediction of protein networks through data integration
 
Network integration of heterogeneous data
Network integration of heterogeneous dataNetwork integration of heterogeneous data
Network integration of heterogeneous data
 
Large-scale integration of data and text
Large-scale integration of data and textLarge-scale integration of data and text
Large-scale integration of data and text
 
Using networks to derive function
Using networks to derive functionUsing networks to derive function
Using networks to derive function
 
Large-scale data and text mining
Large-scale data and text miningLarge-scale data and text mining
Large-scale data and text mining
 
Data integration - Integration of functional associations using STRING
Data integration - Integration of functional associations using STRINGData integration - Integration of functional associations using STRING
Data integration - Integration of functional associations using STRING
 
Data integration and functional association networks
Data integration and functional association networksData integration and functional association networks
Data integration and functional association networks
 
Integration of diverse large-scale datasets
Integration of diverse large-scale datasetsIntegration of diverse large-scale datasets
Integration of diverse large-scale datasets
 
Large-scale integration of data and text
Large-scale integration of data and textLarge-scale integration of data and text
Large-scale integration of data and text
 
Proteomics - Analysis and integration of large-scale data sets
Proteomics - Analysis and integration of large-scale data setsProteomics - Analysis and integration of large-scale data sets
Proteomics - Analysis and integration of large-scale data sets
 
Integration of heterogeneous data
Integration of heterogeneous dataIntegration of heterogeneous data
Integration of heterogeneous data
 
Large-scale biomedical data and text integration
Large-scale biomedical data and text integrationLarge-scale biomedical data and text integration
Large-scale biomedical data and text integration
 
Information integration
Information integrationInformation integration
Information integration
 
STRING: Prediction of protein networks through integration of diverse large-s...
STRING: Prediction of protein networks through integration of diverse large-s...STRING: Prediction of protein networks through integration of diverse large-s...
STRING: Prediction of protein networks through integration of diverse large-s...
 
Exploring proteins, chemicals and their interactions with STRING and STITCH
Exploring proteins, chemicals and their interactions with STRING and STITCHExploring proteins, chemicals and their interactions with STRING and STITCH
Exploring proteins, chemicals and their interactions with STRING and STITCH
 
Text and data integration
Text and data integrationText and data integration
Text and data integration
 

More from Lars Juhl Jensen

More from Lars Juhl Jensen (20)

One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Illustrating the power of dictionary-based named entit...One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Illustrating the power of dictionary-based named entit...
 
One tagger, many uses: Simple text-mining strategies for biomedicine
One tagger, many uses: Simple text-mining strategies for biomedicineOne tagger, many uses: Simple text-mining strategies for biomedicine
One tagger, many uses: Simple text-mining strategies for biomedicine
 
Extract 2.0: Text-mining-assisted interactive annotation
Extract 2.0: Text-mining-assisted interactive annotationExtract 2.0: Text-mining-assisted interactive annotation
Extract 2.0: Text-mining-assisted interactive annotation
 
Network visualization: A crash course on using Cytoscape
Network visualization: A crash course on using CytoscapeNetwork visualization: A crash course on using Cytoscape
Network visualization: A crash course on using Cytoscape
 
STRING & STITCH : Network integration of heterogeneous data
STRING & STITCH: Network integration of heterogeneous dataSTRING & STITCH: Network integration of heterogeneous data
STRING & STITCH : Network integration of heterogeneous data
 
Biomedical text mining: Automatic processing of unstructured text
Biomedical text mining: Automatic processing of unstructured textBiomedical text mining: Automatic processing of unstructured text
Biomedical text mining: Automatic processing of unstructured text
 
Medical network analysis: Linking diseases and genes through data and text mi...
Medical network analysis: Linking diseases and genes through data and text mi...Medical network analysis: Linking diseases and genes through data and text mi...
Medical network analysis: Linking diseases and genes through data and text mi...
 
Network Biology: A crash course on STRING and Cytoscape
Network Biology: A crash course on STRING and CytoscapeNetwork Biology: A crash course on STRING and Cytoscape
Network Biology: A crash course on STRING and Cytoscape
 
Cellular networks
Cellular networksCellular networks
Cellular networks
 
Cellular Network Biology: Large-scale integration of data and text
Cellular Network Biology: Large-scale integration of data and textCellular Network Biology: Large-scale integration of data and text
Cellular Network Biology: Large-scale integration of data and text
 
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
 
Tagger: Rapid dictionary-based named entity recognition
Tagger: Rapid dictionary-based named entity recognitionTagger: Rapid dictionary-based named entity recognition
Tagger: Rapid dictionary-based named entity recognition
 
Network Biology: Large-scale integration of data and text
Network Biology: Large-scale integration of data and textNetwork Biology: Large-scale integration of data and text
Network Biology: Large-scale integration of data and text
 
Medical text mining: Linking diseases, drugs, and adverse reactions
Medical text mining: Linking diseases, drugs, and adverse reactionsMedical text mining: Linking diseases, drugs, and adverse reactions
Medical text mining: Linking diseases, drugs, and adverse reactions
 
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactionsMedical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactions
 
Cellular Network Biology
Cellular Network BiologyCellular Network Biology
Cellular Network Biology
 
Biomarker bioinformatics: Network-based candidate prioritization
Biomarker bioinformatics: Network-based candidate prioritizationBiomarker bioinformatics: Network-based candidate prioritization
Biomarker bioinformatics: Network-based candidate prioritization
 
The Art of Counting: Scoring and ranking co-occurrences in literature
The Art of Counting: Scoring and ranking co-occurrences in literatureThe Art of Counting: Scoring and ranking co-occurrences in literature
The Art of Counting: Scoring and ranking co-occurrences in literature
 
Text-mining-based retrieval of protein networks
Text-mining-based retrieval of protein networksText-mining-based retrieval of protein networks
Text-mining-based retrieval of protein networks
 
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactionsMedical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactions
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

The STRING database - Quality scores for heterogeneous interaction data