Amit Sheth's Keynote at Semantic Web Technologies for Science and Engineering Workshop (held in conjunction with ISWC2003), Sanibel Island, FL, October 20, 2003.
APM Welcome, APM North West Network Conference, Synergies Across Sectors
Semantics for Bioinformatics: What, Why and How of Search, Integration and Analysis
1.
2.
3.
4.
5.
6. What is difficult, tedious and time consuming now … What genes do we all have in common?* Research to answer this question took scientists two years** * G. Strobel and J Arnold. Essential Eukaryotic Core, Evolution (to appear, 2003) **but we now believe with semantic techniques and technology, we can answer similar questions much faster
7. Why? Bioinformatics, ca. 2002 Bioinformatics In the XXI Century From http://prometheus.frii.com/~gnat/tmp/stein_keynote.ppt
8. Science then , then and now In the beginning, there was thought and observation.
9.
10. The achievements are still admirable … Reasoning and mostly passive observation were the main techniques in scientific research until recently. … as we can see
15. Science then, then and now Ontologies embody agreement among multiple parties and capture shared knowledge. Ontology is a powerful tool to help with communication, sharing and discovery. We are able to find relevant information ( semantic search/browsing ), connect knowledge and information ( semantic normalization/integration ), find relationships between pieces of knowledge from different fields ( gain insight, discover knowledge )
24. Broad Scope of Semantic (Web) Technology Other dimensions: how agreements are reached, … Lots of Useful Semantic Technology (interoperability, Integration) Cf: Guarino, Gruber Gen. Purpose, Broad Based Scope of Agreement Task/ App Domain Industry Common Sense Degree of Agreement Informal Semi-Formal Formal Agreement About Data/ Info. Function Execution Qos Current Semantic Web Focus Semantic Web Processes
25. Knowledge Representation and Ontologies Catalog/ID General Logical constraints Terms/ glossary Thesauri “ narrower term” relation Formal is-a Frames (properties) Informal is-a Formal instance Value Restriction Disjointness, Inverse, part of… Ontology Dimensions After McGuinness and Finin Simple Taxonomies Expressive Ontologies Wordnet CYC RDF DAML OO DB Schema RDFS IEEE SUO OWL UMLS GO KEGG GlycO TAMBIS EcoCyc BioPAX
26. GlycO: Glycan Structure Ontology UGA’s “Bioinformatics for Glycan Expression” proj. Not just Schema/Description (partial view shown), also description base/ontology population. In progress, uses OWL.
27.
28.
29. Existing Systems using Semantics for Bioinformatics FOCUS: SEMANTIC SEARCH AND BROWSING (with nascent work in discovery)
30. Recent Articles Experts Organizations Metabolic Pathways Protein Families Proteins Genes Related Diseases
40. Semantic Querying, Browsing, Integration to find potential antifungal drug targets Databases for different organisms Is this or similar gene in other organism? (most Antifungals are associated with Sterol mechanism ) Services: BLAST, Co-expression analysis, Phylogeny; If annotated, directly access DB, else use BLAST to normalize FGDB
47. Applying Semantics to BioInformatics : Example 3 Using Ontologies in cancer research
48. Disparate Data from Different Experiments Metastatic cancer cells Increased GNT-V Activity Experiment 1 Experiment 2 Cancer marker glycan sequence elevated in glycoprotein beta 1 integrin
49.
50.
51. Applying Semantics to BioInformatics : Example 4 Applying Semantics to BioInformatics Processes
52.
53.
54.
55. Semantic Bioinformatics Processes SIMILAR SEQUENCES MATHCER GO id MULTIPLE SEQUENCE ALIGNMENT PHYLOGEN TREE CREATOR FUNCTIONAL SEMANTICS Use Functional Ontologies for finding relevant services Sequence Matcher ENTREZ FETCH LOOKUP Sequence Alignment CLUSTAL MEME Phylogen Tree Creator PAUP PHYLIP TREEVIEW
56. Semantic Bioinformatics Processes SIMILAR SEQUENCES MATHCER GO id MULTIPLE SEQUENCE ALIGNMENT PHYLOGEN TREE CREATOR Use QoS Ontologies to make a choice Sequence Matcher ENTREZ FETCH QOS Time X secs Reliability X % QoS SEMANTICS QOS Time X secs Reliability X %
57. Semantic Bioinformatics Processes GO id DATA SEMANTICS Use Concept Ontologies for Interoperability MAY REQUIRE DATA CONVERSION MAY REQUIRE DATA CONVERSION SIMILAR SEQUENCES MATHCER MULTIPLE SEQUENCE ALIGNMENT PHYLOGEN TREE CREATOR
58. Semantic Bioinformatics Processes GO id CLUSTAL PAUP EXECUTION SEMANTICS Use Execution Semantics for execution monitoring of different instances GO id MEME PHYLIP GO id CLUSTAL PAUP FETCH ENTREZ ENTREZ
Semantics (of information, communication) is a very old area, and extensive work on Semantic Technology has been going on for well over a decade (many projects on semantic interoperability, semantic information brokering) Semantic Web and related visions are being achieved in various depth and scope – mostly starting with targeted applications where requirements are much better understood and scope is manageable
GO (Gene ontology). KEGG (Kyoto Encyclopedia of Genes and Genomes) is a bioinformatics resource for understanding higher order functional meanings and utilities of the cell or the organism from its genome information. TAMBIS (Transparent Access to Multiple Bioinformatics Information Source). TAMBIS aims to aid researchers in biological science by providing a single access point for biological information sources round the world. EcoCyc , a part of the BioCyc library, is a scientific database for the bacterium Escherichia coli. The EcoCyc project performs literature-based curation of the entire E. coli genome, and of E. coli transcriptional regulation, transporters, and metabolic pathways. BioPAX (Biological Pathways Exchange).
Go ontology (schema) – corresponding KB for gene interaction can come from: Protein-protein interaction, GenBank phylogenetic relatedness, micro array data coexpression
Is gene the biologist researching in other organism? Is a similar gene in other organism? [if annotated, directly access DB, if not, use BLAST to normalize] Finding potential antifungal drug targets (most known antifungals are associated with Sterol metabolism), using BLAST to normalize the results (genetic sequences) from different databases associated with different organisms SGD: baker’s yeast; GUS: human, plasmodium, trypanosomes, ..; FGDB: pneumocystis Services: BLAST, co expression analysis, phylogeny
Change all these things.. Make them more concept killing
Big Question: What essential genes we all have in common?
Big Question: What essential genes we all have in common?
Essential eukaryotic core. All 202 genes of P. carinii listed are shared between P. carinii , S. pombe , and S. cerevisiae . All genes included are either lethal as knockouts in S. pombe or S. cerevisiae . Most genes on the diagram use the S. cerevisiae name, but a few follow the naming convention of S. pombe ( i.e ., ypt2 ), when there was more annotation in S. pombe . Genes in DNA- (light blue), RNA- (dark blue), protein- (red), signaling- (purple), metabolism- (orange), transport- (green), or other- (black) related processes are color-coded.