SlideShare une entreprise Scribd logo
1  sur  46
Ondex – Data integration and 			visualisation Catherine Canevet Rothamsted Research London Biogeeks – May Tech Meet
Rothamsted Research North Wyke ,[object Object]
Almost certainly the oldest in the world (started in 1843)
350 Scientific staff
Open weekend May 22nd-23rd 11am-5pmwww.rothamsted.ac.uk/openweekend/
Outline ,[object Object]
 Data integration in Ondex
 Data visualisation in Ondex and application cases,[object Object]
 Data integration in Ondex
 Data visualisation in Ondex and application cases,[object Object]
Genomics, transcriptomics, proteomics, metabolomics, …
The biological systems span multiple levels of biological organisation
Non-trivial to integrate the data 2 main challenges
Syntactic integration challenge Over 1000 databases freely available to public Over 60 million sequences in GenBank Over 870 complete genomes and many ongoing projects Over 17 million citations in PubMed PubMed growth by 600,000 publications each year Integration of Life Science data sources is essential for Systems Biology research http://www.ncbi.nlm.nih.gov/Database
Ear Semantic Integration challenge Same concept different names Synonyms Same name different concepts Homographs
Outline ,[object Object]
Data integration in Ondex
 Data visualisation in Ondex and application cases,[object Object]
Concepts and relations (1/2) interact Cell Protein – Protein interaction network (PPI) Cellular location of proteins Protein Protein e.g. Network of Concepts and Relations RelationType interact located in ConceptClass ConceptClass Protein CelComp Protein Protein Properties: compound name, protein sequence, protein structure, cellular component, KM-value, PH optimum …  Ontology of Concept Classes, Relation Types and additional Properties
Reaction Reaction produced by consumed by consumed by produced by Metabolite Metabolite Metabolite Concepts and relations (2/2) Transformation to binary graph Properties: compound name, protein sequence, protein structure, cellular component, KM-value, PH optimum …  Concepts: Relations:
Data integration in Ondex Data Integration Data Input Graph of concepts and relations  Biological Databases Import Ontologies & Free Text Data alignment ,[object Object]
 Sequence analysis
 Text miningExperimental Data
Importing data into Ondex What databases to import What format these are in Ondex parsers already written Generic OBO, PSI-MI, SBML, Tab-delimited, Fasta Database-specific Aracyc, AtRegNet, BioCyc, BioGRID, Brenda, Drastic, EcoCyc, GO, GOA, Gramene, Grassius, KEGG, Medline, MetaCyc, Oglycbase, OMIM, PDB, Pfam, SGD, TAIR, TIGR, Transfac, Transpath, UniProt, WGS, WordNet
Example of resulting graph Has similar sequence Target sequence Binds to, has similar sequence Repressed by, regulated by, activated by Member is part of Gene Protein Encoded by Is_a Member is part of Is_a Transcription factor Is_a Member is part of Enzyme Protein complex Is_a catalyses Catalysing class Member is part of Reaction Member is part of EC Is_a Pathway
Ondex Data Integration Scheme Treatments from DRASTIC Graph alignment Pathways from KEGG Data input& transformation Data integration Visualisation Clients/Tools Heterogeneous  data sources Ondexgraph warehouse Integration Methods Ondex Visualization  Tool Kit UniProt Accession Generalized Object Data Model Database Layer Parser Name based Web Client AraCyc Parser Transitive Taverna KEGG Blast Parser ProteinFamily Transfac Data Exchange Parser Pfam2GO OXL/RDF Microarray Lucene Parser Web Service
Semantic Integration by Graph Alignment Create relations between equivalent entries from different data sources Identified by mapping methods Concept accessions (UniProt ID) Concept name (gene name), synonyms Sequence methods Graph neighbourhood Text mining
Outline ,[object Object]
Data integration in Ondex
Data visualisation in Ondex and application cases,[object Object]
Complexity of interactions
PPI, co-expression, 	co-citation, … ,[object Object]
Candidate gene prioritisation and pathway discovery Use Ondex tools (filters, annotators, layouts …)
Filters Integrating different datasets   large resulting graph Need to narrow down Select meaningful areas of the graph Example in Ondex protein-protein interaction network
Filters in Ondex Protein protein interactions measured using quantitative techniques ,[object Object]
 Threshold filter,[object Object]
http://www.phi-base.org/ ,[object Object]
Loss of pathogenicity
Reduced virulence
Only genes validated by gene disruption experiments,[object Object]
Integrated phenotype and comparative genome information
Annotators (1/3) ,[object Object]
Colour
Shape
Size,[object Object]
Annotators (2/3) Virtual Knock-out Annotator to see how important a single concept is to all possible paths contained in a network  Ondex resizes the concepts based on this score Scale Concept by Value  Pie charts Up/down regulation is indicated in red/green

Contenu connexe

Tendances

Opening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs apiOpening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs apiChris Evelo
 
Bioinformatics Final Report
Bioinformatics Final ReportBioinformatics Final Report
Bioinformatics Final ReportShruthi Choudary
 
Analysis with biological pathways:
Analysis with biological pathways: Analysis with biological pathways:
Analysis with biological pathways: Chris Evelo
 
Bioinformatics Databases
Bioinformatics DatabasesBioinformatics Databases
Bioinformatics Databasescschlos2
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsmikaelhuss
 
Bioinformatics-General_Intro
Bioinformatics-General_IntroBioinformatics-General_Intro
Bioinformatics-General_IntroAbhiroop Ghatak
 
LECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSLECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSMSCW Mysore
 
Using ontologies to do integrative systems biology
Using ontologies to do integrative systems biologyUsing ontologies to do integrative systems biology
Using ontologies to do integrative systems biologyChris Evelo
 
Databases in Bioinformatics
Databases in BioinformaticsDatabases in Bioinformatics
Databases in BioinformaticsMeghaj Mallick
 
Literature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resourcesLiterature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resourcesHammad Afzal
 
WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...Chris Evelo
 
Using biological network approaches for dynamic extension of micronutrient re...
Using biological network approaches for dynamic extension of micronutrient re...Using biological network approaches for dynamic extension of micronutrient re...
Using biological network approaches for dynamic extension of micronutrient re...Chris Evelo
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsAyeshaYousaf20
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsDuncan Hull
 
Investigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisInvestigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisCatherine Canevet
 
Project report-on-bio-informatics
Project report-on-bio-informaticsProject report-on-bio-informatics
Project report-on-bio-informaticsDaniela Rotariu
 
Tools of bioinforformatics by kk
Tools of bioinforformatics by kkTools of bioinforformatics by kk
Tools of bioinforformatics by kkKAUSHAL SAHU
 
UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOAEBI
 
introduction of Bioinformatics
introduction of Bioinformaticsintroduction of Bioinformatics
introduction of BioinformaticsVinaKhan1
 

Tendances (20)

Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
 
Opening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs apiOpening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs api
 
Bioinformatics Final Report
Bioinformatics Final ReportBioinformatics Final Report
Bioinformatics Final Report
 
Analysis with biological pathways:
Analysis with biological pathways: Analysis with biological pathways:
Analysis with biological pathways:
 
Bioinformatics Databases
Bioinformatics DatabasesBioinformatics Databases
Bioinformatics Databases
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomics
 
Bioinformatics-General_Intro
Bioinformatics-General_IntroBioinformatics-General_Intro
Bioinformatics-General_Intro
 
LECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSLECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICS
 
Using ontologies to do integrative systems biology
Using ontologies to do integrative systems biologyUsing ontologies to do integrative systems biology
Using ontologies to do integrative systems biology
 
Databases in Bioinformatics
Databases in BioinformaticsDatabases in Bioinformatics
Databases in Bioinformatics
 
Literature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resourcesLiterature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resources
 
WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...
 
Using biological network approaches for dynamic extension of micronutrient re...
Using biological network approaches for dynamic extension of micronutrient re...Using biological network approaches for dynamic extension of micronutrient re...
Using biological network approaches for dynamic extension of micronutrient re...
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomics
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of Bioinformatics
 
Investigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisInvestigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysis
 
Project report-on-bio-informatics
Project report-on-bio-informaticsProject report-on-bio-informatics
Project report-on-bio-informatics
 
Tools of bioinforformatics by kk
Tools of bioinforformatics by kkTools of bioinforformatics by kk
Tools of bioinforformatics by kk
 
UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOA
 
introduction of Bioinformatics
introduction of Bioinformaticsintroduction of Bioinformatics
introduction of Bioinformatics
 

Similaire à Ondex: Data integration and visualisation

Prediction of protein function
Prediction of protein functionPrediction of protein function
Prediction of protein functionLars Juhl Jensen
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAGopen_phacts
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europeopen_phacts
 
UniProt and the Semantic Web
UniProt and the Semantic WebUniProt and the Semantic Web
UniProt and the Semantic WebChimezie Ogbuji
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformaticsAtai Rabby
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects Carole Goble
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data miningSangeeta Das
 
OVium Bioinformatic Solutions
OVium Bioinformatic SolutionsOVium Bioinformatic Solutions
OVium Bioinformatic SolutionsOVium Solutions
 
T-BioInfo Methods and Approaches
T-BioInfo Methods and ApproachesT-BioInfo Methods and Approaches
T-BioInfo Methods and ApproachesElia Brodsky
 
Bio onttalk 30minutes-june2003[1]
Bio onttalk 30minutes-june2003[1]Bio onttalk 30minutes-june2003[1]
Bio onttalk 30minutes-june2003[1]Joanne Luciano
 
Bioinformatics مي.pdf
Bioinformatics  مي.pdfBioinformatics  مي.pdf
Bioinformatics مي.pdfnedalalazzwy
 
Una estrategia para la integración de ontologías, servicios web y PLN en el a...
Una estrategia para la integración de ontologías, servicios web y PLN en el a...Una estrategia para la integración de ontologías, servicios web y PLN en el a...
Una estrategia para la integración de ontologías, servicios web y PLN en el a...Anubis Hosein
 
Introduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdfIntroduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdfkigaruantony
 
Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyMelanie Courtot
 
Dynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical CommunicationsDynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical CommunicationsTim Clark
 

Similaire à Ondex: Data integration and visualisation (20)

Prediction of protein function
Prediction of protein functionPrediction of protein function
Prediction of protein function
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe
 
UniProt and the Semantic Web
UniProt and the Semantic WebUniProt and the Semantic Web
UniProt and the Semantic Web
 
String.pptx
String.pptxString.pptx
String.pptx
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
 
OVium Bioinformatic Solutions
OVium Bioinformatic SolutionsOVium Bioinformatic Solutions
OVium Bioinformatic Solutions
 
Protein databases
Protein databasesProtein databases
Protein databases
 
Major biological nucleotide databases
Major biological nucleotide databasesMajor biological nucleotide databases
Major biological nucleotide databases
 
T-BioInfo Methods and Approaches
T-BioInfo Methods and ApproachesT-BioInfo Methods and Approaches
T-BioInfo Methods and Approaches
 
T-bioinfo overview
T-bioinfo overviewT-bioinfo overview
T-bioinfo overview
 
Bio onttalk 30minutes-june2003[1]
Bio onttalk 30minutes-june2003[1]Bio onttalk 30minutes-june2003[1]
Bio onttalk 30minutes-june2003[1]
 
Bioinformatics مي.pdf
Bioinformatics  مي.pdfBioinformatics  مي.pdf
Bioinformatics مي.pdf
 
Una estrategia para la integración de ontologías, servicios web y PLN en el a...
Una estrategia para la integración de ontologías, servicios web y PLN en el a...Una estrategia para la integración de ontologías, servicios web y PLN en el a...
Una estrategia para la integración de ontologías, servicios web y PLN en el a...
 
gky1131.pdf
gky1131.pdfgky1131.pdf
gky1131.pdf
 
Introduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdfIntroduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdf
 
Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontology
 
Dynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical CommunicationsDynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical Communications
 

Plus de Biogeeks

Perl cures coronary heart disease
Perl cures coronary heart diseasePerl cures coronary heart disease
Perl cures coronary heart diseaseBiogeeks
 
Poing: a coder’s take on protein modelling
Poing: a coder’s take on protein modellingPoing: a coder’s take on protein modelling
Poing: a coder’s take on protein modellingBiogeeks
 
Identifying genes and proteins in text: a short review of available tools and...
Identifying genes and proteins in text: a short review of available tools and...Identifying genes and proteins in text: a short review of available tools and...
Identifying genes and proteins in text: a short review of available tools and...Biogeeks
 
DASbrick: A cloud based Rich internet application for Synthetic Biology Parts...
DASbrick: A cloud based Rich internet application for Synthetic Biology Parts...DASbrick: A cloud based Rich internet application for Synthetic Biology Parts...
DASbrick: A cloud based Rich internet application for Synthetic Biology Parts...Biogeeks
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache SolrBiogeeks
 
ABC-SysBio – Approximate Bayesian Computation in Python with GPU support
ABC-SysBio – Approximate Bayesian Computation in Python with GPU supportABC-SysBio – Approximate Bayesian Computation in Python with GPU support
ABC-SysBio – Approximate Bayesian Computation in Python with GPU supportBiogeeks
 

Plus de Biogeeks (6)

Perl cures coronary heart disease
Perl cures coronary heart diseasePerl cures coronary heart disease
Perl cures coronary heart disease
 
Poing: a coder’s take on protein modelling
Poing: a coder’s take on protein modellingPoing: a coder’s take on protein modelling
Poing: a coder’s take on protein modelling
 
Identifying genes and proteins in text: a short review of available tools and...
Identifying genes and proteins in text: a short review of available tools and...Identifying genes and proteins in text: a short review of available tools and...
Identifying genes and proteins in text: a short review of available tools and...
 
DASbrick: A cloud based Rich internet application for Synthetic Biology Parts...
DASbrick: A cloud based Rich internet application for Synthetic Biology Parts...DASbrick: A cloud based Rich internet application for Synthetic Biology Parts...
DASbrick: A cloud based Rich internet application for Synthetic Biology Parts...
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache Solr
 
ABC-SysBio – Approximate Bayesian Computation in Python with GPU support
ABC-SysBio – Approximate Bayesian Computation in Python with GPU supportABC-SysBio – Approximate Bayesian Computation in Python with GPU support
ABC-SysBio – Approximate Bayesian Computation in Python with GPU support
 

Dernier

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 

Dernier (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 

Ondex: Data integration and visualisation

  • 1. Ondex – Data integration and visualisation Catherine Canevet Rothamsted Research London Biogeeks – May Tech Meet
  • 2.
  • 3. Almost certainly the oldest in the world (started in 1843)
  • 5. Open weekend May 22nd-23rd 11am-5pmwww.rothamsted.ac.uk/openweekend/
  • 6.
  • 8.
  • 10.
  • 12. The biological systems span multiple levels of biological organisation
  • 13. Non-trivial to integrate the data 2 main challenges
  • 14. Syntactic integration challenge Over 1000 databases freely available to public Over 60 million sequences in GenBank Over 870 complete genomes and many ongoing projects Over 17 million citations in PubMed PubMed growth by 600,000 publications each year Integration of Life Science data sources is essential for Systems Biology research http://www.ncbi.nlm.nih.gov/Database
  • 15. Ear Semantic Integration challenge Same concept different names Synonyms Same name different concepts Homographs
  • 16.
  • 18.
  • 19. Concepts and relations (1/2) interact Cell Protein – Protein interaction network (PPI) Cellular location of proteins Protein Protein e.g. Network of Concepts and Relations RelationType interact located in ConceptClass ConceptClass Protein CelComp Protein Protein Properties: compound name, protein sequence, protein structure, cellular component, KM-value, PH optimum … Ontology of Concept Classes, Relation Types and additional Properties
  • 20. Reaction Reaction produced by consumed by consumed by produced by Metabolite Metabolite Metabolite Concepts and relations (2/2) Transformation to binary graph Properties: compound name, protein sequence, protein structure, cellular component, KM-value, PH optimum … Concepts: Relations:
  • 21.
  • 24. Importing data into Ondex What databases to import What format these are in Ondex parsers already written Generic OBO, PSI-MI, SBML, Tab-delimited, Fasta Database-specific Aracyc, AtRegNet, BioCyc, BioGRID, Brenda, Drastic, EcoCyc, GO, GOA, Gramene, Grassius, KEGG, Medline, MetaCyc, Oglycbase, OMIM, PDB, Pfam, SGD, TAIR, TIGR, Transfac, Transpath, UniProt, WGS, WordNet
  • 25. Example of resulting graph Has similar sequence Target sequence Binds to, has similar sequence Repressed by, regulated by, activated by Member is part of Gene Protein Encoded by Is_a Member is part of Is_a Transcription factor Is_a Member is part of Enzyme Protein complex Is_a catalyses Catalysing class Member is part of Reaction Member is part of EC Is_a Pathway
  • 26. Ondex Data Integration Scheme Treatments from DRASTIC Graph alignment Pathways from KEGG Data input& transformation Data integration Visualisation Clients/Tools Heterogeneous data sources Ondexgraph warehouse Integration Methods Ondex Visualization Tool Kit UniProt Accession Generalized Object Data Model Database Layer Parser Name based Web Client AraCyc Parser Transitive Taverna KEGG Blast Parser ProteinFamily Transfac Data Exchange Parser Pfam2GO OXL/RDF Microarray Lucene Parser Web Service
  • 27. Semantic Integration by Graph Alignment Create relations between equivalent entries from different data sources Identified by mapping methods Concept accessions (UniProt ID) Concept name (gene name), synonyms Sequence methods Graph neighbourhood Text mining
  • 28.
  • 30.
  • 32.
  • 33. Candidate gene prioritisation and pathway discovery Use Ondex tools (filters, annotators, layouts …)
  • 34. Filters Integrating different datasets  large resulting graph Need to narrow down Select meaningful areas of the graph Example in Ondex protein-protein interaction network
  • 35.
  • 36.
  • 37.
  • 40.
  • 41. Integrated phenotype and comparative genome information
  • 42.
  • 44. Shape
  • 45.
  • 46. Annotators (2/3) Virtual Knock-out Annotator to see how important a single concept is to all possible paths contained in a network Ondex resizes the concepts based on this score Scale Concept by Value Pie charts Up/down regulation is indicated in red/green
  • 47. AraCyc ONDEX Application case2: Mapping microarray expression data to integrated pathways Parser tab file Arabidopsis C/N uptake OXL tab file Jan Taubert Accession based Mapping usingTAIR IDs Ondex Interactive exploration Enriched spreadsheet, e.g. AraCyc pathways
  • 48.
  • 49.
  • 53. Network diameter Add annotation to the graph
  • 54. Application case 3: Arabidopsis PPI network Artem Lysenko IntAct TAIR BioGRID  Mapping the 3 databases based on TAIR accessions
  • 55. Adding 3 sources of evidence co-expression sequence similarity co-occurrence in scientific literature  facilitate the identification of functionally related groups of proteins
  • 56. Added attributes to nodes/edges Network stats Betweenness centrality (BWC)  How influential (bridge) Degree centrality (DC)  Hub likeness Markov Clustering Identifies strongly connected groups of proteins in the network
  • 57.
  • 58. Degree centrality repr. by node size
  • 59. Betweenness centrality repr. by node colourArtem Lysenko
  • 60. Filters, annotators and layouts Combination of these three types of tools in Ondex  a more complex application case …
  • 61. Application case 4: Bioenergy Project Use bioinformatics to support phenotype-genotype research in bioenergy crops Given a phenotypic variant is it possible to pin down the relevant genes? Develop tools to support systematic analysis of QTL regions to pin down relevant genes Identify genes implicated in biomass production in willow Prioritise genes for experimental validation Keywan Hassani-Pak Biofuel Conversion Process http://www.jgi.doe.gov/education/bioenergy/bioenergy_1.html
  • 62. QTL and Genomic Data QTL Willow genome is not sequenced yetQTL may encompass many potentialcandidates, perhaps hundreds Poplar is the first tree with fully sequenced genome 19 Chromosomes, 45778 predicted genes 4x larger than Arabidopsis genome Not much known about the function of the genes
  • 63. Linking genes to data sources Linked References model e.g. Poplar, Arabidopsis Willow Pathways Plant Hormones QTL Map Orthologous Markers Physical map Expression Patterns Genes Gene Function List of candidate genes linked to biological processes
  • 64. Relevant Data Sources Release 15.10 Poplar Gene Prediction v2.0 (Jan 2010) All plants: 739,396 proteins Reviewed: 28,404 proteins (3,84%) PoplarCyc 1.0: 285 pathways, 3434 enzymes, 1363 compounds (Oct 2009) Pfam 24.0: 11,912 protein families (Oct 2009) Poplar Transcription Factors - DPTF: 2,576 putative TF (March 2007) - PlnTFDB: 2,901 putative TF (July 2009) 29,365 GO terms (Jan 2010) Poplar/ Willow QTL - work in progress - preliminary dataset available Only loading referenced publications ~15,000 articles
  • 65. Unique Knowledge Base for Poplar Proteins annotated with functional information and publications Based on Comparative genomics and Protein familyanalysis Genes, QTLs enriched withpositionalinformation Data integration was done in Ondex
  • 66. Ondex Genomics Layout Genomic Layout displays chromosomes, genes and QTLs Chromosomal regions and QTLs can be selected
  • 67. Ondex Genomics Filter Genes of interest Enriched protein annotation network
  • 68. Phenotypic Information in Literature HMMer: 650581 – HLH E-Value: 3.4E-7 Score: 30.0 BLAST 217086 – LAX E-Value: 8.3E-17 Score: 80.88 BLAST 217086 – BHLH63 E-Value: 8.3E-9 Score: 54.3 PMID:13130077 “LAX and SPA: major regulators of shoot branching in rice.” Poplar protein 217086 We identified two remote homologs in Rice (LAX) and in Arabidopsis (BHLH63), as well as one protein domain HLH The LAX homolog contains evidence to be a major regulator of shoot branching  Hypothesis generation
  • 69.
  • 71. Text miningExperimental Data Hypothesis New experiments
  • 72.
  • 81.
  • 91.
  • 101.
  • 106.
  • 107.
  • 109.

Notes de l'éditeur

  1. Light pink – Increased virulenceLight blue – Reduced virulenceLight Green – Loss of pathogenicityYellow – Unaffected pathogenicityStar – animalCircle – plant
  2. Virtual KO scoreis based on 3 other scores: - "extension" gives the number of paths that would be extended if a concept was added- "deletion" gives the number of paths that would be deleted if this concept was deleted- "nochange" gives the number of paths that would not be shortened/extended if this concept was deleted
  3. IntAct4625 protein interactions (data derived from literature curation or direct user submissions)TAIR (The Arabidopsis Information Resource) – 1143 interactionsgenome sequence, gene structure, gene product information, metabolism, gene expression, DNA and seed stocks, genome maps, genetic and physical markers, publicationsBioGrid (General Repository for Interaction Datasets)collections of protein and genetic interactions from major model organism species1223 interactions for Arabidopsis derived from high-throughput studies and conventional focused studies
  4. ATTED II (Arabidopsis thalianatrans-factor and cis-element prediction database)provides co-regulated gene relationships in Arabidopsis to estimate gene functionsgives the Pearson correlation coefficients of co-expressed genes in Arabidopsis calculated from available microarray dataNCBI PSI-BLASTidentify similarities between our reference set of proteinsMatching against Arabidopsis subset of UNIPROTCo-occurrence of protein names25,900 Medline abstracts related to Arabidopsis ThalianaIntegrated Lucene-based mapping method
  5. Solid biomass (in the form of plants and trees) can be converted into liquid fuels (such as ethanol, methanol, and biodiesel)The challenge lies in efficient conversion,creating more energy than the input required to produce itincrease biomass yieldDevelop means to support systematic analysis of QTL regions and prioritise genes for experimental analyses identify genes controlling biomass production in willow
  6. QTL are genomic regions that assign variations observed in a phenotype to a region on the genetic mapBiomass traits: branching, height, leaf number etc.Going from Willow to Poplar to Arabidopsis and other species
  7. Reduced hypothesis space from 100 potential candidates to 3 hot candidates.Next steps: Cloning and transformation for experimental validation.