SlideShare une entreprise Scribd logo
1  sur  63
 
Acknowledgements ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Excellent starting point for complementary material (a partial list) ,[object Object],[object Object],[object Object],[object Object],[object Object]
Some current  BioInformatics Systems ,[object Object],[object Object],[object Object],[object Object],[object Object]
Outline of this talk… ,[object Object],[object Object],[object Object],[object Object],[object Object]
What is difficult, tedious and    time consuming now … What genes do we all have in common?*  Research to answer this question took scientists two years** * G. Strobel and J Arnold.  Essential Eukaryotic Core,    Evolution (to appear, 2003) **but we now believe with semantic techniques and    technology, we can answer similar questions much    faster
Why? Bioinformatics, ca. 2002 Bioinformatics In the XXI Century From http://prometheus.frii.com/~gnat/tmp/stein_keynote.ppt
Science  then , then and now In the beginning, there was thought and observation.
Science  then , then and now For a long time this didn’t change. ,[object Object],[object Object]
The achievements are still admirable … Reasoning and mostly passive observation were the main techniques in scientific research until recently. … as we can see
Science then, then and  now A vast amount of information
Science then, then and  now No single person,  no group has an overview of what  is known . Known, But not known …   not known
We don’t always know what we are looking for. ,[object Object],[object Object],[object Object],[object Object]
Science then, then and  now
Science then, then and  now Ontologies embody agreement among multiple parties  and capture shared knowledge.  Ontology is a powerful tool to help with communication, sharing and discovery.  We are able to find relevant information ( semantic search/browsing ), connect knowledge and information ( semantic normalization/integration ), find relationships between pieces of knowledge from different fields ( gain insight, discover knowledge )
Intervention by Ontologies, some near future Bioinformatics In the XXI Century
Outline of the talk… ,[object Object],[object Object],[object Object],[object Object],[object Object]
Challenges in biology ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
… and their implications ,[object Object],[object Object],[object Object],[object Object],[object Object]
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object]
What can BioInformatics do? ,[object Object],[object Object],[object Object],[object Object],[object Object]
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object]
Paradigm shift over time:    Syntax -> Semantics ,[object Object],[object Object],[object Object],[object Object]
Broad Scope of Semantic (Web) Technology Other dimensions: how agreements are reached, … Lots of  Useful Semantic Technology (interoperability, Integration) Cf: Guarino, Gruber Gen. Purpose, Broad Based Scope of Agreement Task/  App Domain  Industry Common Sense Degree of Agreement Informal Semi-Formal Formal Agreement About Data/ Info. Function Execution Qos Current Semantic  Web Focus Semantic Web  Processes
Knowledge Representation and Ontologies  Catalog/ID General Logical constraints Terms/ glossary Thesauri “ narrower term” relation Formal is-a Frames (properties) Informal is-a Formal instance Value Restriction Disjointness, Inverse, part of… Ontology Dimensions After McGuinness and Finin Simple Taxonomies Expressive Ontologies Wordnet CYC RDF DAML OO DB Schema RDFS IEEE SUO OWL UMLS GO KEGG GlycO TAMBIS EcoCyc BioPAX
GlycO: Glycan Structure Ontology UGA’s “Bioinformatics for Glycan Expression” proj. Not  just  Schema/Description (partial view shown), also description base/ontology population. In progress, uses  OWL.
What can current semantic technology do?  (sample) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],* Commercial: Semagix;  % : Near-commercial: IBM/SemTAP;  #  Commercial: Network Inference; ^ LSDIS-UGA Research
Industry Efforts (examples with bioinformatics applications only) ,[object Object],[object Object],[object Object]
Existing Systems using Semantics for Bioinformatics FOCUS: SEMANTIC SEARCH AND BROWSING (with nascent work in discovery)
Recent Articles Experts Organizations Metabolic  Pathways Protein Families Proteins Genes Related Diseases
Semagix Freedom Architecture  (a platform for building ontology-driven information system) Ontology © Semagix, Inc. Content Sources Semi- Structured CA Content Agents Structured Unstructured Documents Reports XML/Feeds Websites Email Databases CA CA Knowledge Sources KA KS KS KA KA KS Knowledge Agents KS Metabase Semantic Enhancement Server Entity Extraction, Enhanced Metadata, Automatic Classification Semantic   Query Server Ontology and Metabase Main Memory Index Metadata adapter Metadata adapter Existing Applications ECM EIP CRM
Practical Ontology Development Observation by Semagix ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
1. Ontology Model Creation (Description) 2. Knowledge Agent Creation 3. Automatic aggregation of Knowledge 4. Querying the Ontology Ontology Creation and Maintenance Steps © Semagix, Inc. Ontology Semantic Query  Server
 
Cerebra’s myGrid Framework
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object]
Applying Semantics to  BioInformatics : Example 1 Semantic Browsing, Querying and Integration
Present: User queries multiple sources Heterogeneous data sources on the web ?
Future: the Web-Service queries multiple sources
Semantic Querying, Browsing, Integration to find potential antifungal drug targets  Databases for different organisms Is this or similar gene in other organism?  (most Antifungals are associated with Sterol mechanism ) Services: BLAST, Co-expression analysis, Phylogeny; If annotated, directly access DB, else use BLAST to normalize FGDB
Applying Semantics to  BioInformatics : Example 2 Analytics in Drug Discovery
Analytics, Using Explicit and Implicit Relationships in  Drug Discovery ,[object Object],[object Object]
Step 1: Capture domains using ontologies ,[object Object],MOLECULE ONTOLOGY Molecule A Compound  A Compound B DISEASE ONTOLOGY Disease D PATHOGEN ONTOLOGY Pathogen X Protein P Protein Q
[object Object],[object Object],[object Object],[object Object],Step 2: Traverse explicit relationships ,[object Object],[object Object],DISEASE ONTOLOGY Disease D PATHOGEN ONTOLOGY Pathogen X Protein Q STEP 1: 1.Look up the disease ontology 2. Identify the disease causing  pathogen. STEP 2: 1.Look up the pathogen ontology 2. Identify the molecular  composition of the pathogen. MOLECULE ONTOLOGY Molecule A Compound  B Compound C STEP 3: 1.Look up the molecule  ontology 2. Identify the composition of the possible drug.
Step 3: Discovering Implicit relationships…  ,[object Object],[object Object],MOLECULE ONTOLOGY Molecule A Compound A Compound B PATHOGEN ONTOLOGY Pathogen X Protein Q Compound A inhibits the effect of the pathogen  by killing protein P Compound B produces a toxin on reacting with Protein Q Host Check if the host has protein P Extract the relationships amongst the compounds of the potential   drug and the pathogen .
Inferences Based on  Relationships ,[object Object],[object Object],[object Object],[object Object],[object Object]
Applying Semantics to  BioInformatics : Example 3 Using Ontologies in cancer research
Disparate Data from Different Experiments Metastatic cancer cells Increased GNT-V Activity Experiment 1 Experiment 2 Cancer marker glycan sequence elevated in glycoprotein beta 1 integrin
Knowledge Stored in Ontologies ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Finding New Information ,[object Object],[object Object],[object Object],[object Object],[object Object]
Applying Semantics to  BioInformatics : Example 4 Applying Semantics to BioInformatics Processes
Creating BioSemantic Processes  ,[object Object],[object Object],[object Object]
Creating BioSemantic Processes  ,[object Object],[object Object],[object Object],[object Object],[object Object]
BioSemantic Process Definition ,[object Object],[object Object],[object Object],GO id SIMILAR SEQUENCES MATHCER MULTIPLE SEQUENCE ALIGNMENT PHYLOGEN  TREE CREATOR
Semantic Bioinformatics Processes SIMILAR SEQUENCES MATHCER GO id MULTIPLE SEQUENCE ALIGNMENT PHYLOGEN  TREE CREATOR FUNCTIONAL  SEMANTICS Use Functional Ontologies for finding relevant services Sequence Matcher ENTREZ FETCH LOOKUP Sequence Alignment CLUSTAL MEME Phylogen  Tree Creator PAUP PHYLIP TREEVIEW
Semantic Bioinformatics Processes SIMILAR SEQUENCES MATHCER GO id MULTIPLE SEQUENCE ALIGNMENT PHYLOGEN  TREE CREATOR Use QoS Ontologies to make a  choice Sequence Matcher ENTREZ FETCH QOS Time X secs Reliability X % QoS  SEMANTICS QOS Time X secs Reliability X %
Semantic Bioinformatics Processes GO id DATA  SEMANTICS Use Concept Ontologies for Interoperability MAY REQUIRE  DATA CONVERSION MAY REQUIRE  DATA CONVERSION SIMILAR SEQUENCES MATHCER MULTIPLE SEQUENCE ALIGNMENT PHYLOGEN  TREE CREATOR
Semantic Bioinformatics Processes GO id CLUSTAL PAUP EXECUTION SEMANTICS Use Execution Semantics for execution monitoring of different instances GO id MEME PHYLIP  GO id CLUSTAL PAUP FETCH ENTREZ ENTREZ
Semantic Web Process Design Template Construction
Common genes
Conclusion ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Conclusion ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Sources ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Contenu connexe

Tendances

Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
Amit Sheth
 
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
Artificial Intelligence Institute at UofSC
 
Reality Mining (Nathan Eagle)
Reality Mining (Nathan Eagle)Reality Mining (Nathan Eagle)
Reality Mining (Nathan Eagle)
Jan Sifra
 

Tendances (20)

What's up at Kno.e.sis?
What's up at Kno.e.sis? What's up at Kno.e.sis?
What's up at Kno.e.sis?
 
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
 
Knoesis Student Achievement
Knoesis Student AchievementKnoesis Student Achievement
Knoesis Student Achievement
 
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
 
Physical Cyber Social Computing
Physical Cyber Social ComputingPhysical Cyber Social Computing
Physical Cyber Social Computing
 
Machines are people too
Machines are people tooMachines are people too
Machines are people too
 
Inauguration Function - Ohio Center of Excellence in Knowledge-Enabled Comput...
Inauguration Function - Ohio Center of Excellence in Knowledge-Enabled Comput...Inauguration Function - Ohio Center of Excellence in Knowledge-Enabled Comput...
Inauguration Function - Ohio Center of Excellence in Knowledge-Enabled Comput...
 
The Amazing Ways Artificial Intelligence Is Transforming Genomics and Gene Ed...
The Amazing Ways Artificial Intelligence Is Transforming Genomics and Gene Ed...The Amazing Ways Artificial Intelligence Is Transforming Genomics and Gene Ed...
The Amazing Ways Artificial Intelligence Is Transforming Genomics and Gene Ed...
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of Bioinformatics
 
Bioinformatics in the Era of Open Science and Big Data
Bioinformatics in the Era of Open Science and Big DataBioinformatics in the Era of Open Science and Big Data
Bioinformatics in the Era of Open Science and Big Data
 
Deep learning for biomedical discovery and data mining II
Deep learning for biomedical discovery and data mining IIDeep learning for biomedical discovery and data mining II
Deep learning for biomedical discovery and data mining II
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture Data
 
AI in the Covid-19 pandemic
AI in the Covid-19 pandemicAI in the Covid-19 pandemic
AI in the Covid-19 pandemic
 
Reality Mining (Nathan Eagle)
Reality Mining (Nathan Eagle)Reality Mining (Nathan Eagle)
Reality Mining (Nathan Eagle)
 
The real world of ontologies and phenotype representation: perspectives from...
The real world of ontologies and phenotype representation:  perspectives from...The real world of ontologies and phenotype representation:  perspectives from...
The real world of ontologies and phenotype representation: perspectives from...
 
Collins seattle-2014-final
Collins seattle-2014-finalCollins seattle-2014-final
Collins seattle-2014-final
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformatics
 
The need for a transparent data supply chain
The need for a transparent data supply chainThe need for a transparent data supply chain
The need for a transparent data supply chain
 
Data-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystemData-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystem
 
Big data from small data: A deep survey of the neuroscience landscape data via
Big data from small data:  A deep survey of the neuroscience landscape data viaBig data from small data:  A deep survey of the neuroscience landscape data via
Big data from small data: A deep survey of the neuroscience landscape data via
 

Similaire à Semantics for Bioinformatics: What, Why and How of Search, Integration and Analysis

download
downloaddownload
download
butest
 
Ontology for the Financial Services Industry
Ontology for the Financial Services IndustryOntology for the Financial Services Industry
Ontology for the Financial Services Industry
Barry Smith
 
Ben Goertzel AIs, Superflies and the Path to Immortality - singsum au 2011
Ben Goertzel AIs, Superflies and the Path to Immortality - singsum au 2011Ben Goertzel AIs, Superflies and the Path to Immortality - singsum au 2011
Ben Goertzel AIs, Superflies and the Path to Immortality - singsum au 2011
Adam Ford
 
How Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open ScienceHow Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open Science
drnigam
 
The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960
mare34
 
Greene Bosc2008
Greene Bosc2008Greene Bosc2008
Greene Bosc2008
bosc_2008
 

Similaire à Semantics for Bioinformatics: What, Why and How of Search, Integration and Analysis (20)

Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giants
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 
A01-Openness in knowledge-based systems
A01-Openness in knowledge-based systemsA01-Openness in knowledge-based systems
A01-Openness in knowledge-based systems
 
Prosdocimi ucb cdao
Prosdocimi ucb cdaoProsdocimi ucb cdao
Prosdocimi ucb cdao
 
download
downloaddownload
download
 
Ontology for the Financial Services Industry
Ontology for the Financial Services IndustryOntology for the Financial Services Industry
Ontology for the Financial Services Industry
 
Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014
 
Ben Goertzel AIs, Superflies and the Path to Immortality - singsum au 2011
Ben Goertzel AIs, Superflies and the Path to Immortality - singsum au 2011Ben Goertzel AIs, Superflies and the Path to Immortality - singsum au 2011
Ben Goertzel AIs, Superflies and the Path to Immortality - singsum au 2011
 
BioPortal: ontologies and integrated data resources at the click of a mouse
BioPortal: ontologies and integrated data resourcesat the click of a mouseBioPortal: ontologies and integrated data resourcesat the click of a mouse
BioPortal: ontologies and integrated data resources at the click of a mouse
 
How Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open ScienceHow Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open Science
 
The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960
 
A biologist in e-Science
A biologist in e-ScienceA biologist in e-Science
A biologist in e-Science
 
Deep learning for biomedical discovery and data mining I
Deep learning for biomedical discovery and data mining IDeep learning for biomedical discovery and data mining I
Deep learning for biomedical discovery and data mining I
 
Open Data in a Global Ecosystem
Open Data in a Global EcosystemOpen Data in a Global Ecosystem
Open Data in a Global Ecosystem
 
Greene Bosc2008
Greene Bosc2008Greene Bosc2008
Greene Bosc2008
 
Ibn Sina
Ibn SinaIbn Sina
Ibn Sina
 
PhDc exam presentation
PhDc exam presentationPhDc exam presentation
PhDc exam presentation
 
Tales from BioLand - Engineering Challenges in the World of Life Sciences
Tales from BioLand - Engineering Challenges in the World of Life SciencesTales from BioLand - Engineering Challenges in the World of Life Sciences
Tales from BioLand - Engineering Challenges in the World of Life Sciences
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forum
 
The biodiversity informatics landscape: a systematics perspective
The biodiversity informatics landscape: a systematics perspectiveThe biodiversity informatics landscape: a systematics perspective
The biodiversity informatics landscape: a systematics perspective
 

Dernier

Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
MateoGardella
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 

Dernier (20)

Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 

Semantics for Bioinformatics: What, Why and How of Search, Integration and Analysis

  • 1.  
  • 2.
  • 3.
  • 4.
  • 5.
  • 6. What is difficult, tedious and time consuming now … What genes do we all have in common?* Research to answer this question took scientists two years** * G. Strobel and J Arnold.  Essential Eukaryotic Core, Evolution (to appear, 2003) **but we now believe with semantic techniques and technology, we can answer similar questions much faster
  • 7. Why? Bioinformatics, ca. 2002 Bioinformatics In the XXI Century From http://prometheus.frii.com/~gnat/tmp/stein_keynote.ppt
  • 8. Science then , then and now In the beginning, there was thought and observation.
  • 9.
  • 10. The achievements are still admirable … Reasoning and mostly passive observation were the main techniques in scientific research until recently. … as we can see
  • 11. Science then, then and now A vast amount of information
  • 12. Science then, then and now No single person, no group has an overview of what is known . Known, But not known …  not known
  • 13.
  • 15. Science then, then and now Ontologies embody agreement among multiple parties and capture shared knowledge. Ontology is a powerful tool to help with communication, sharing and discovery. We are able to find relevant information ( semantic search/browsing ), connect knowledge and information ( semantic normalization/integration ), find relationships between pieces of knowledge from different fields ( gain insight, discover knowledge )
  • 16. Intervention by Ontologies, some near future Bioinformatics In the XXI Century
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24. Broad Scope of Semantic (Web) Technology Other dimensions: how agreements are reached, … Lots of Useful Semantic Technology (interoperability, Integration) Cf: Guarino, Gruber Gen. Purpose, Broad Based Scope of Agreement Task/ App Domain Industry Common Sense Degree of Agreement Informal Semi-Formal Formal Agreement About Data/ Info. Function Execution Qos Current Semantic Web Focus Semantic Web Processes
  • 25. Knowledge Representation and Ontologies Catalog/ID General Logical constraints Terms/ glossary Thesauri “ narrower term” relation Formal is-a Frames (properties) Informal is-a Formal instance Value Restriction Disjointness, Inverse, part of… Ontology Dimensions After McGuinness and Finin Simple Taxonomies Expressive Ontologies Wordnet CYC RDF DAML OO DB Schema RDFS IEEE SUO OWL UMLS GO KEGG GlycO TAMBIS EcoCyc BioPAX
  • 26. GlycO: Glycan Structure Ontology UGA’s “Bioinformatics for Glycan Expression” proj. Not just Schema/Description (partial view shown), also description base/ontology population. In progress, uses OWL.
  • 27.
  • 28.
  • 29. Existing Systems using Semantics for Bioinformatics FOCUS: SEMANTIC SEARCH AND BROWSING (with nascent work in discovery)
  • 30. Recent Articles Experts Organizations Metabolic Pathways Protein Families Proteins Genes Related Diseases
  • 31. Semagix Freedom Architecture (a platform for building ontology-driven information system) Ontology © Semagix, Inc. Content Sources Semi- Structured CA Content Agents Structured Unstructured Documents Reports XML/Feeds Websites Email Databases CA CA Knowledge Sources KA KS KS KA KA KS Knowledge Agents KS Metabase Semantic Enhancement Server Entity Extraction, Enhanced Metadata, Automatic Classification Semantic Query Server Ontology and Metabase Main Memory Index Metadata adapter Metadata adapter Existing Applications ECM EIP CRM
  • 32.
  • 33. 1. Ontology Model Creation (Description) 2. Knowledge Agent Creation 3. Automatic aggregation of Knowledge 4. Querying the Ontology Ontology Creation and Maintenance Steps © Semagix, Inc. Ontology Semantic Query Server
  • 34.  
  • 36.
  • 37. Applying Semantics to BioInformatics : Example 1 Semantic Browsing, Querying and Integration
  • 38. Present: User queries multiple sources Heterogeneous data sources on the web ?
  • 39. Future: the Web-Service queries multiple sources
  • 40. Semantic Querying, Browsing, Integration to find potential antifungal drug targets Databases for different organisms Is this or similar gene in other organism? (most Antifungals are associated with Sterol mechanism ) Services: BLAST, Co-expression analysis, Phylogeny; If annotated, directly access DB, else use BLAST to normalize FGDB
  • 41. Applying Semantics to BioInformatics : Example 2 Analytics in Drug Discovery
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47. Applying Semantics to BioInformatics : Example 3 Using Ontologies in cancer research
  • 48. Disparate Data from Different Experiments Metastatic cancer cells Increased GNT-V Activity Experiment 1 Experiment 2 Cancer marker glycan sequence elevated in glycoprotein beta 1 integrin
  • 49.
  • 50.
  • 51. Applying Semantics to BioInformatics : Example 4 Applying Semantics to BioInformatics Processes
  • 52.
  • 53.
  • 54.
  • 55. Semantic Bioinformatics Processes SIMILAR SEQUENCES MATHCER GO id MULTIPLE SEQUENCE ALIGNMENT PHYLOGEN TREE CREATOR FUNCTIONAL SEMANTICS Use Functional Ontologies for finding relevant services Sequence Matcher ENTREZ FETCH LOOKUP Sequence Alignment CLUSTAL MEME Phylogen Tree Creator PAUP PHYLIP TREEVIEW
  • 56. Semantic Bioinformatics Processes SIMILAR SEQUENCES MATHCER GO id MULTIPLE SEQUENCE ALIGNMENT PHYLOGEN TREE CREATOR Use QoS Ontologies to make a choice Sequence Matcher ENTREZ FETCH QOS Time X secs Reliability X % QoS SEMANTICS QOS Time X secs Reliability X %
  • 57. Semantic Bioinformatics Processes GO id DATA SEMANTICS Use Concept Ontologies for Interoperability MAY REQUIRE DATA CONVERSION MAY REQUIRE DATA CONVERSION SIMILAR SEQUENCES MATHCER MULTIPLE SEQUENCE ALIGNMENT PHYLOGEN TREE CREATOR
  • 58. Semantic Bioinformatics Processes GO id CLUSTAL PAUP EXECUTION SEMANTICS Use Execution Semantics for execution monitoring of different instances GO id MEME PHYLIP GO id CLUSTAL PAUP FETCH ENTREZ ENTREZ
  • 59. Semantic Web Process Design Template Construction
  • 61.
  • 62.
  • 63.

Notes de l'éditeur

  1. Semantics (of information, communication) is a very old area, and extensive work on Semantic Technology has been going on for well over a decade (many projects on semantic interoperability, semantic information brokering) Semantic Web and related visions are being achieved in various depth and scope – mostly starting with targeted applications where requirements are much better understood and scope is manageable
  2. GO (Gene ontology). KEGG (Kyoto Encyclopedia of Genes and Genomes) is a bioinformatics resource for understanding higher order functional meanings and utilities of the cell or the organism from its genome information. TAMBIS (Transparent Access to Multiple Bioinformatics Information Source). TAMBIS aims to aid researchers in biological science by providing a single access point for biological information sources round the world. EcoCyc , a part of the BioCyc library, is a scientific database for the bacterium Escherichia coli. The EcoCyc project performs literature-based curation of the entire E. coli genome, and of E. coli transcriptional regulation, transporters, and metabolic pathways. BioPAX (Biological Pathways Exchange).
  3. Go ontology (schema) – corresponding KB for gene interaction can come from: Protein-protein interaction, GenBank phylogenetic relatedness, micro array data coexpression
  4. Is gene the biologist researching in other organism? Is a similar gene in other organism? [if annotated, directly access DB, if not, use BLAST to normalize] Finding potential antifungal drug targets (most known antifungals are associated with Sterol metabolism), using BLAST to normalize the results (genetic sequences) from different databases associated with different organisms SGD: baker’s yeast; GUS: human, plasmodium, trypanosomes, ..; FGDB: pneumocystis Services: BLAST, co expression analysis, phylogeny
  5. Change all these things.. Make them more concept killing
  6. Big Question: What essential genes we all have in common?
  7. Big Question: What essential genes we all have in common?
  8. Essential eukaryotic core. All 202 genes of P. carinii listed are shared between P. carinii , S. pombe , and S. cerevisiae . All genes included are either lethal as knockouts in S. pombe or S. cerevisiae . Most genes on the diagram use the S. cerevisiae name, but a few follow the naming convention of S. pombe ( i.e ., ypt2 ), when there was more annotation in S. pombe . Genes in DNA- (light blue), RNA- (dark blue), protein- (red), signaling- (purple), metabolism- (orange), transport- (green), or other- (black) related processes are color-coded.