SlideShare une entreprise Scribd logo
1  sur  90
The STRING database Lars Juhl Jensen EMBL Heidelberg
data integration
Jensen et al., Drug Discovery Today: Targets, 2004
functional interactions
Bork et al., Current Opinion in Structural Biology, 2005
373 proteomes
Genome Reviews
RefSeq
Ensembl
model organism databases
genomic context methods
gene fusion
 
gene neighborhood
 
phylogenetic profiles
 
 
 
 
Cell Cellulosomes Cellulose
automation
scoring scheme
correct interactions
wrong associations
gene fusion
sequence similarity
 
gene neighborhood
sum of intergenic distances
 
phylogenetic profiles
SVD Singular Value Decomposition
Euclidian distance
 
raw quality scores
not comparable
sequence similarity
sum of intergenic distances
Euclidian distance
benchmarking
calibrate vs. gold standard
 
raw quality scores
probabilistic scores
curated knowledge
KEGG Kyoto Encyclopedia of Genes and Genomes
Reactome
MIPS Munich Information center for Protein Sequences
STKE Signal Transduction Knowledge Environment
primary experimental data
many sources
many parsers
physical protein interactions
BIND Biomolecular Interaction Network Database
GRID General Repository for Interaction Datasets
MINT Molecular Interactions Database
DIP Database of Interacting Proteins
HPRD Human Protein Reference Database
merge data by publication
topology-based scores
von Mering et al., Nucleic Acids Research, 2005
co-expression
GEO Gene Expression Omnibus
correlation coefficient
literature mining
different gene identifiers
synonyms lists
M EDLINE
SGD Saccharomyces Genome Database
The Interactive Fly
OMIM Online Mendelian Inheritance in Man
co-mentioning
NLP Natural Language Processing
[object Object],[object Object],[object Object],[object Object],[object Object]
calibrate vs. gold standard
 
combine all evidence
spread over many species
transfer by orthology
von Mering et al., Nucleic Acids Research, 2005
two modes
 
orthologous groups
von Mering et al., Nucleic Acids Research, 2005
fuzzy orthology
von Mering et al., Nucleic Acids Research, 2005
Bayesian scoring scheme
Bork et al., Current Opinion in Structural Biology, 2005
Acknowledgments ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Contenu connexe

Tendances

Network biology: Large-scale data and text mining
Network biology: Large-scale data and text miningNetwork biology: Large-scale data and text mining
Network biology: Large-scale data and text mining
Lars Juhl Jensen
 
Systems biology: Bioinformatics on complete biological system
Systems biology: Bioinformatics on complete biological systemSystems biology: Bioinformatics on complete biological system
Systems biology: Bioinformatics on complete biological system
Lars Juhl Jensen
 
Network biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text miningNetwork biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text mining
Lars Juhl Jensen
 
Systems biology - Understanding biology at the systems level
Systems biology - Understanding biology at the systems levelSystems biology - Understanding biology at the systems level
Systems biology - Understanding biology at the systems level
Lars Juhl Jensen
 
Making gene networks through data integration
Making gene networks through data integrationMaking gene networks through data integration
Making gene networks through data integration
Lars Juhl Jensen
 

Tendances (20)

Introduction to STRING
Introduction to STRINGIntroduction to STRING
Introduction to STRING
 
STRING: Large-scale data and text mining
STRING: Large-scale data and text miningSTRING: Large-scale data and text mining
STRING: Large-scale data and text mining
 
STRING - Modeling of biological systems through cross-species data integ...
STRING - Modeling of biological systems through cross-species data integ...STRING - Modeling of biological systems through cross-species data integ...
STRING - Modeling of biological systems through cross-species data integ...
 
Integration of heterogeneous data
Integration of heterogeneous dataIntegration of heterogeneous data
Integration of heterogeneous data
 
Cross-species data integration
Cross-species data integrationCross-species data integration
Cross-species data integration
 
Gene association networks - Large-scale integration of data and text
Gene association networks - Large-scale integration of data and textGene association networks - Large-scale integration of data and text
Gene association networks - Large-scale integration of data and text
 
STRING - Large-scale integration of data and text
STRING - Large-scale integration of data and textSTRING - Large-scale integration of data and text
STRING - Large-scale integration of data and text
 
Gene association networks - Large-scale integration of data and text
Gene association networks - Large-scale integration of data and textGene association networks - Large-scale integration of data and text
Gene association networks - Large-scale integration of data and text
 
Network biology: A crash course on STRING and Cytoscape
Network biology: A crash course on STRING and CytoscapeNetwork biology: A crash course on STRING and Cytoscape
Network biology: A crash course on STRING and Cytoscape
 
Network biology - Large-scale integration of data and text
Network biology - Large-scale integration of data and textNetwork biology - Large-scale integration of data and text
Network biology - Large-scale integration of data and text
 
Network biology: Large-scale data and text mining
Network biology: Large-scale data and text miningNetwork biology: Large-scale data and text mining
Network biology: Large-scale data and text mining
 
Gene association networks - Large-scale integration of data and text
Gene association networks - Large-scale integration of data and textGene association networks - Large-scale integration of data and text
Gene association networks - Large-scale integration of data and text
 
Large-scale integration of data and text
Large-scale integration of data and textLarge-scale integration of data and text
Large-scale integration of data and text
 
Biomarker bioinformatics: Network-based candidate prioritization
Biomarker bioinformatics: Network-based candidate prioritizationBiomarker bioinformatics: Network-based candidate prioritization
Biomarker bioinformatics: Network-based candidate prioritization
 
Systems biology: Bioinformatics on complete biological system
Systems biology: Bioinformatics on complete biological systemSystems biology: Bioinformatics on complete biological system
Systems biology: Bioinformatics on complete biological system
 
STRING & related databases: Large-scale integration of heterogeneous data
STRING & related databases: Large-scale integration of heterogeneous dataSTRING & related databases: Large-scale integration of heterogeneous data
STRING & related databases: Large-scale integration of heterogeneous data
 
Network biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text miningNetwork biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text mining
 
Systems biology - Understanding biology at the systems level
Systems biology - Understanding biology at the systems levelSystems biology - Understanding biology at the systems level
Systems biology - Understanding biology at the systems level
 
Cellular network biology: Proteome-wide analysis of heterogeneous data
Cellular network biology: Proteome-wide analysis of heterogeneous dataCellular network biology: Proteome-wide analysis of heterogeneous data
Cellular network biology: Proteome-wide analysis of heterogeneous data
 
Making gene networks through data integration
Making gene networks through data integrationMaking gene networks through data integration
Making gene networks through data integration
 

En vedette

Networks of proteins and diseases
Networks of proteins and diseasesNetworks of proteins and diseases
Networks of proteins and diseases
Lars Juhl Jensen
 
Network biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text miningNetwork biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text mining
Lars Juhl Jensen
 
Identification of drug targets from side-effect similarity
Identification of drug targets from side-effect similarityIdentification of drug targets from side-effect similarity
Identification of drug targets from side-effect similarity
Lars Juhl Jensen
 
Mining heterogeneous data: Understanding systems at the level of complexes an...
Mining heterogeneous data: Understanding systems at the level of complexes an...Mining heterogeneous data: Understanding systems at the level of complexes an...
Mining heterogeneous data: Understanding systems at the level of complexes an...
Lars Juhl Jensen
 
Systems biology: Large-scale biomedical data mining
Systems biology: Large-scale biomedical data miningSystems biology: Large-scale biomedical data mining
Systems biology: Large-scale biomedical data mining
Lars Juhl Jensen
 

En vedette (17)

Medical data and text mining - Linking diseases, drugs, and adverse reactions
Medical data and text mining - Linking diseases, drugs, and adverse reactionsMedical data and text mining - Linking diseases, drugs, and adverse reactions
Medical data and text mining - Linking diseases, drugs, and adverse reactions
 
Mining heaps of data and piles of papers
Mining heaps of data and piles of papersMining heaps of data and piles of papers
Mining heaps of data and piles of papers
 
Networks of proteins and diseases
Networks of proteins and diseasesNetworks of proteins and diseases
Networks of proteins and diseases
 
The Literature Text Mining Approach In Cancer Research
The Literature Text Mining Approach In Cancer ResearchThe Literature Text Mining Approach In Cancer Research
The Literature Text Mining Approach In Cancer Research
 
Integration of heterogeneous data
Integration of heterogeneous dataIntegration of heterogeneous data
Integration of heterogeneous data
 
Network biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text miningNetwork biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text mining
 
Identification of drug targets from side-effect similarity
Identification of drug targets from side-effect similarityIdentification of drug targets from side-effect similarity
Identification of drug targets from side-effect similarity
 
Information integration
Information integrationInformation integration
Information integration
 
Biological literature mining - from information retrieval to biological disco...
Biological literature mining - from information retrieval to biological disco...Biological literature mining - from information retrieval to biological disco...
Biological literature mining - from information retrieval to biological disco...
 
Biomedical literature mining
Biomedical literature miningBiomedical literature mining
Biomedical literature mining
 
Biomedical literature mining (and why we really need open access)
Biomedical literature mining (and why we really need open access)Biomedical literature mining (and why we really need open access)
Biomedical literature mining (and why we really need open access)
 
Literature-based discovery: it's all about connecting dots in widely disparat...
Literature-based discovery: it's all about connecting dots in widely disparat...Literature-based discovery: it's all about connecting dots in widely disparat...
Literature-based discovery: it's all about connecting dots in widely disparat...
 
Mining heterogeneous data: Understanding systems at the level of complexes an...
Mining heterogeneous data: Understanding systems at the level of complexes an...Mining heterogeneous data: Understanding systems at the level of complexes an...
Mining heterogeneous data: Understanding systems at the level of complexes an...
 
One tagger, many uses - Illustrating the power of ontologies in named entity ...
One tagger, many uses - Illustrating the power of ontologies in named entity ...One tagger, many uses - Illustrating the power of ontologies in named entity ...
One tagger, many uses - Illustrating the power of ontologies in named entity ...
 
Systems biology: Large-scale biomedical data mining
Systems biology: Large-scale biomedical data miningSystems biology: Large-scale biomedical data mining
Systems biology: Large-scale biomedical data mining
 
Bibliological data science and drug discovery
Bibliological data science and drug discoveryBibliological data science and drug discovery
Bibliological data science and drug discovery
 
Biomedical Relation Extraction for Knowledge Graph Completion
Biomedical Relation Extraction for Knowledge Graph CompletionBiomedical Relation Extraction for Knowledge Graph Completion
Biomedical Relation Extraction for Knowledge Graph Completion
 

Similaire à The STRING database

Large-scale integration of data and text
Large-scale integration of data and textLarge-scale integration of data and text
Large-scale integration of data and text
Lars Juhl Jensen
 
Large-scale data and text mining
Large-scale data and text miningLarge-scale data and text mining
Large-scale data and text mining
Lars Juhl Jensen
 
Networks of proteins and diseases
Networks of proteins and diseasesNetworks of proteins and diseases
Networks of proteins and diseases
Lars Juhl Jensen
 
Protein interaction networks
Protein interaction networksProtein interaction networks
Protein interaction networks
Lars Juhl Jensen
 

Similaire à The STRING database (17)

Using networks to derive function
Using networks to derive functionUsing networks to derive function
Using networks to derive function
 
Data integration and functional association networks
Data integration and functional association networksData integration and functional association networks
Data integration and functional association networks
 
Functional association networks - The STRING and STITCH web resources
Functional association networks - The STRING and STITCH web resourcesFunctional association networks - The STRING and STITCH web resources
Functional association networks - The STRING and STITCH web resources
 
Integration of diverse large-scale datasets
Integration of diverse large-scale datasetsIntegration of diverse large-scale datasets
Integration of diverse large-scale datasets
 
Data integration - Integration of functional associations using STRING
Data integration - Integration of functional associations using STRINGData integration - Integration of functional associations using STRING
Data integration - Integration of functional associations using STRING
 
Prediction of protein networks through data integration
Prediction of protein networks through data integrationPrediction of protein networks through data integration
Prediction of protein networks through data integration
 
Advanced bioinformatics of proteomics datasets
Advanced bioinformaticsof proteomics datasetsAdvanced bioinformaticsof proteomics datasets
Advanced bioinformatics of proteomics datasets
 
Systems biology: Bioinformatics on complete biological systems
Systems biology: Bioinformatics on complete biological systemsSystems biology: Bioinformatics on complete biological systems
Systems biology: Bioinformatics on complete biological systems
 
Large-scale integration of data and text
Large-scale integration of data and textLarge-scale integration of data and text
Large-scale integration of data and text
 
Protein interaction networks from yeast to human
Protein interaction networks from yeast to humanProtein interaction networks from yeast to human
Protein interaction networks from yeast to human
 
Large-scale data and text mining
Large-scale data and text miningLarge-scale data and text mining
Large-scale data and text mining
 
Network integration of heterogeneous data
Network integration of heterogeneous dataNetwork integration of heterogeneous data
Network integration of heterogeneous data
 
Networks of proteins and diseases
Networks of proteins and diseasesNetworks of proteins and diseases
Networks of proteins and diseases
 
Data and Text Mining
Data and Text MiningData and Text Mining
Data and Text Mining
 
The STITCH and Reflect web resources
The STITCH and Reflect web resourcesThe STITCH and Reflect web resources
The STITCH and Reflect web resources
 
STRING: Protein networks from data and text mining
STRING: Protein networks from data and text miningSTRING: Protein networks from data and text mining
STRING: Protein networks from data and text mining
 
Protein interaction networks
Protein interaction networksProtein interaction networks
Protein interaction networks
 

Plus de Lars Juhl Jensen

Plus de Lars Juhl Jensen (20)

One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Illustrating the power of dictionary-based named entit...One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Illustrating the power of dictionary-based named entit...
 
One tagger, many uses: Simple text-mining strategies for biomedicine
One tagger, many uses: Simple text-mining strategies for biomedicineOne tagger, many uses: Simple text-mining strategies for biomedicine
One tagger, many uses: Simple text-mining strategies for biomedicine
 
Extract 2.0: Text-mining-assisted interactive annotation
Extract 2.0: Text-mining-assisted interactive annotationExtract 2.0: Text-mining-assisted interactive annotation
Extract 2.0: Text-mining-assisted interactive annotation
 
Network visualization: A crash course on using Cytoscape
Network visualization: A crash course on using CytoscapeNetwork visualization: A crash course on using Cytoscape
Network visualization: A crash course on using Cytoscape
 
STRING & STITCH : Network integration of heterogeneous data
STRING & STITCH: Network integration of heterogeneous dataSTRING & STITCH: Network integration of heterogeneous data
STRING & STITCH : Network integration of heterogeneous data
 
Biomedical text mining: Automatic processing of unstructured text
Biomedical text mining: Automatic processing of unstructured textBiomedical text mining: Automatic processing of unstructured text
Biomedical text mining: Automatic processing of unstructured text
 
Medical network analysis: Linking diseases and genes through data and text mi...
Medical network analysis: Linking diseases and genes through data and text mi...Medical network analysis: Linking diseases and genes through data and text mi...
Medical network analysis: Linking diseases and genes through data and text mi...
 
Network Biology: A crash course on STRING and Cytoscape
Network Biology: A crash course on STRING and CytoscapeNetwork Biology: A crash course on STRING and Cytoscape
Network Biology: A crash course on STRING and Cytoscape
 
Cellular networks
Cellular networksCellular networks
Cellular networks
 
Cellular Network Biology: Large-scale integration of data and text
Cellular Network Biology: Large-scale integration of data and textCellular Network Biology: Large-scale integration of data and text
Cellular Network Biology: Large-scale integration of data and text
 
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
 
Tagger: Rapid dictionary-based named entity recognition
Tagger: Rapid dictionary-based named entity recognitionTagger: Rapid dictionary-based named entity recognition
Tagger: Rapid dictionary-based named entity recognition
 
Network Biology: Large-scale integration of data and text
Network Biology: Large-scale integration of data and textNetwork Biology: Large-scale integration of data and text
Network Biology: Large-scale integration of data and text
 
Medical text mining: Linking diseases, drugs, and adverse reactions
Medical text mining: Linking diseases, drugs, and adverse reactionsMedical text mining: Linking diseases, drugs, and adverse reactions
Medical text mining: Linking diseases, drugs, and adverse reactions
 
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and textNetwork biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
 
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactionsMedical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactions
 
Cellular Network Biology
Cellular Network BiologyCellular Network Biology
Cellular Network Biology
 
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and textNetwork biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
 
The Art of Counting: Scoring and ranking co-occurrences in literature
The Art of Counting: Scoring and ranking co-occurrences in literatureThe Art of Counting: Scoring and ranking co-occurrences in literature
The Art of Counting: Scoring and ranking co-occurrences in literature
 
Text-mining-based retrieval of protein networks
Text-mining-based retrieval of protein networksText-mining-based retrieval of protein networks
Text-mining-based retrieval of protein networks
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Dernier (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 

The STRING database