SlideShare une entreprise Scribd logo
1  sur  26
Télécharger pour lire hors ligne
Linking Linked Data
 Linked Data to Integrated Data




Expert Bioinformatics from Bioinformatics Experts
Put your data on the web
make a pretty web site later.




           Expert Bioinformatics from Bioinformatics Experts
Expert Bioinformatics from Bioinformatics Experts
Now we can ask questions like this...
What members of a target pathway are already targeted in other diseases?

           Target               Pathway             Disease

       Chembl         Uniprot            Reactome      OMIM


                                                      Protein
                                Target

          Compound


                                                      Pathway
                                Disease



           Expert Bioinformatics from Bioinformatics Experts
Because we have lots of data exposed
as RDF
                    Uniprot:Protein
                                                             BioPAX:Protein




                                                  Mim:Phenotype




       Expert Bioinformatics from Bioinformatics Experts
What do you do when you have to add
data...




       Expert Bioinformatics from Bioinformatics Experts
Or connect SPARQL endpoints?




    RDF != Linked Data



      Expert Bioinformatics from Bioinformatics Experts
Is your data 5* ?


 Linked data is essential to
 actually connect the semantic
 web. It is quite easy to do with
 a little thought, and becomes
 second nature. Various
 common sense considerations
 determine when to make a link
 and when not to.




            Expert Bioinformatics from Bioinformatics Experts
Example openflydata to BioCyc
 What genes are differentially expressed in the hindgut and are there any
 pathways associated with those genes?
 ● Use FlyAtlas at openflydata.org for tissue specific expression profiles.

 ● Use FlyCyc from BioCyc.

 ● Then SPARQL




            Expert Bioinformatics from Bioinformatics Experts
Problem: Node URIs
<http://openflydata.org/id/flyatlas/affyid/1616608_a_at>
<http://purl.org/NET/flyatlas/schema#gene>
<http://openflydata.org/id/flybase/feature/FBgn0001128> .


<http://biocyc.org/biopax/biopax-level3#UnificationXref202209>
<http://www.biopax.org/release/biopax-level3.owl#xref>
<http://biocyc.org/biopax/biopax-level3#Protein202210>
      .
<http://biocyc.org/biopax/biopax-level3#UnificationXref202209>
<http://www.biopax.org/release/biopax-level3.owl#db>
    FlyCyc
    .
<http://biocyc.org/biopax/biopax-level3#UnificationXref202209>
<http://www.biopax.org/release/biopax-level3.owl#id>
   FBGN0001128
   .


            Expert Bioinformatics from Bioinformatics Experts
Integration Level 1
Use Identifiers.org
 CONSTRUCT {
     ?x
     RDFS:seeAlso
     `bif:sprintf_iri ("http://identifiers.org/flybase/%s", ?id)`
 }
 WHERE {
    ?x BP:unificationxref ?xref .
    ?xref BP:id ?id .
    ?blank BP:db "FlyCyc"^^xsd:string
 }




             Expert Bioinformatics from Bioinformatics Experts
Integration Level 2
adding property characteristics

 BP = <http://www.biopax.org/release/biopax-level3.owl#>

BP:Protein BP:controls BP:Catalysis

BP:Catalysis BP:controls BP:BioChemicalReaction


BP:Protein BP:controls BP:BioChemicalReaction




CONSTRUCT {?x GB:controlledBy ?y }
WHERE

 { ?x BP:controls ?catalysis .
   ?catalysis BP:controls ?y }



            Expert Bioinformatics from Bioinformatics Experts
Integration Level 3
class subsumption

 FlyA = <http://purl.org/NET/flyatlas/schema#>

flywebflyatlas:1616608_a_at a flyatlas:ProbeData



 BP = <http://www.biopax.org/release/biopax-level3.owl#>

 flyatlas:ProbeData rdfs:subClassOf BP:DNARegion




CONSTRUCT {?x a BP:DNARegion }
WHERE

 { ?x a flyatlas:ProbeData }



            Expert Bioinformatics from Bioinformatics Experts
Connect BiochemicalReactions to
Expression Values
SELECT ?name ?id ?mean
WHERE
{
   ?reaction a BP:BiochemicalReaction .
   ?reaction BP:standardName ?name .
   ?reaction GB:controlledBy ?protein .
   ?protein a BP:Protein .
   ?protein BP:xref ?id .
   ?probe a BP:DNARegion .
   ?probe BP:xref ?id .
   ?probe flyatlas:l_fatbody ?blank .
   ?blank flyatlas:mean ?mean
}
LIMIT 5



          No Reasoner – just a few SPARQL CONSTRUCTs

           Expert Bioinformatics from Bioinformatics Experts
Expert Bioinformatics from Bioinformatics Experts
Client Architecture




      Expert Bioinformatics from Bioinformatics Experts
Vocabularies in Linked Data
What does the linked data cloud know about Drugs....
                                                chembl:Activity
                                                chembl:Assay
                                                chembl:AssayCategory
SELECT distinct ?class                          chembl:AssayTargetLink
WHERE                                           chembl:ChemicalCompound
                                      >100      chembl:DrugTarget
{                                               chembl:LiteratureCitation
   ?s a ?class .                                dailymed:drugs
   ?s ?p ?o                                     drugbank:Drug
}                                               drugbank:DrugInteraction
                                                drugbank:EnzymeLink
                                                drugbank:ExternalIdentifier
                                                drugbank:ExternalLink
                                                drugbank:LiteratureCitation
                                                drugbank:Molecule
                                                drugbank:OrganismSpecies
                                                drugbank:Patent
                                                drugbank:ProteinSequence
                                                drugbank:TargetLink
                                                entrez:EnsemblReference
                                                entrez:Gene
                                                pdb:Molecule
                                                pdb:Structure
                                                pubmed:Chemical
                                                pubmed:Citation
           Expert Bioinformatics   from Bioinformatics Experts
                                                pubmed:DatabankReference
Create a tighter more unified “view” under
one schema




        Expert Bioinformatics from Bioinformatics Experts
Unified Vocabulary
What does the linked data cloud know about Drugs....




         Expert Bioinformatics from Bioinformatics Experts
Map Classes and Properties into a
single instantiated view




       Expert Bioinformatics from Bioinformatics Experts
Before Query
SELECT *
WHERE
{
?s drugb:calculatedInChIKey ?inchiD .
?s a drugb:Drug .
?c a Chembl:ChemicalCompund .
?c chembl:standardInChIKey ?inchiC .
FILTER regex(?inchiD, ?inchiC)
}




            Expert Bioinformatics from Bioinformatics Experts
After Query

SELECT *
where
{
?s a GB:Drug .
?s GB:inchiKey ?inchi .
}




            Expert Bioinformatics from Bioinformatics Experts
Linked Data Architecture




      Expert Bioinformatics from Bioinformatics Experts
Creating fixed “views” of Linked Data
When the use of integrated data is fixed e.g. an API or
application, Linked Data can be expensive:
  – Changes to data requires significant recoding
  – Multiple Schemas make queries long and inefficient
• A view or middle layer of data used by the API, changes to
  data are managed by the view and the API is minimally
  disturbed
     – Views are easier to query
     – Views are faster to query
• Client gets the best of both worlds a tight view of data for
  API queries while still having all the advantages of a linked
  data strategy.

          Expert Bioinformatics from Bioinformatics Experts
Summary
●   Exposing data as RDF does not equal Linked Data
●   Making data linked is not hard
      –    Node IRI's
        – Unifying Classes
        – Transitive closure of Properties
●   A little semantics goes a long way (no reasoner required)
●   Creating “Views” from one schema to another is not hard.
      –   But should be easier




            Expert Bioinformatics from Bioinformatics Experts
www.generalbioinformatics.com/science.html




    Expert Bioinformatics from Bioinformatics Experts

Contenu connexe

Tendances

Using biological network approaches for dynamic extension of micronutrient re...
Using biological network approaches for dynamic extension of micronutrient re...Using biological network approaches for dynamic extension of micronutrient re...
Using biological network approaches for dynamic extension of micronutrient re...Chris Evelo
 
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...Chris Southan
 
Psb tutorial cancer_pathways
Psb tutorial cancer_pathwaysPsb tutorial cancer_pathways
Psb tutorial cancer_pathwaysJeff Kiefer
 
Metabolic network visualization - concepts
Metabolic network visualization - conceptsMetabolic network visualization - concepts
Metabolic network visualization - conceptsDinesh Barupal
 
Use of open_linked_data_in_bioinformatics
Use of open_linked_data_in_bioinformaticsUse of open_linked_data_in_bioinformatics
Use of open_linked_data_in_bioinformaticsRemzi Çelebi
 
The Language of the Gene Ontology
The Language of the Gene OntologyThe Language of the Gene Ontology
The Language of the Gene Ontologyrobertstevens65
 
The Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resourcesThe Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resourcesMelanie Courtot
 
BITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS
 
SureChEMBL patent annotations in Open PHACTS
SureChEMBL patent annotations in Open PHACTSSureChEMBL patent annotations in Open PHACTS
SureChEMBL patent annotations in Open PHACTSGeorge Papadatos
 
Curatorial data wrangling for the Guide to PHARMACOLGY
Curatorial data wrangling for the Guide to PHARMACOLGY Curatorial data wrangling for the Guide to PHARMACOLGY
Curatorial data wrangling for the Guide to PHARMACOLGY Chris Southan
 
Standarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesStandarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesYasset Perez-Riverol
 
Metabolic Set Enrichment Analysis - chemrich - 2019
Metabolic Set Enrichment Analysis - chemrich - 2019Metabolic Set Enrichment Analysis - chemrich - 2019
Metabolic Set Enrichment Analysis - chemrich - 2019Dinesh Barupal
 
Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyMelanie Courtot
 
Update on the Druggable Proteome
Update on the Druggable ProteomeUpdate on the Druggable Proteome
Update on the Druggable ProteomeChris Southan
 
Collaboration with GeneGo provides seamless access to compound databases, pat...
Collaboration with GeneGo provides seamless access to compound databases, pat...Collaboration with GeneGo provides seamless access to compound databases, pat...
Collaboration with GeneGo provides seamless access to compound databases, pat...Craig Morgan NZCS, MBA (Hons), PMP
 
SureChEMBL and Open PHACTS
SureChEMBL and Open PHACTSSureChEMBL and Open PHACTS
SureChEMBL and Open PHACTSGeorge Papadatos
 
CINF 55: SureChEMBL: An open patent chemistry resource
CINF 55: SureChEMBL: An open patent chemistry resourceCINF 55: SureChEMBL: An open patent chemistry resource
CINF 55: SureChEMBL: An open patent chemistry resourceGeorge Papadatos
 

Tendances (20)

Using biological network approaches for dynamic extension of micronutrient re...
Using biological network approaches for dynamic extension of micronutrient re...Using biological network approaches for dynamic extension of micronutrient re...
Using biological network approaches for dynamic extension of micronutrient re...
 
GoTermsAnalysisWithR
GoTermsAnalysisWithRGoTermsAnalysisWithR
GoTermsAnalysisWithR
 
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
 
Psb tutorial cancer_pathways
Psb tutorial cancer_pathwaysPsb tutorial cancer_pathways
Psb tutorial cancer_pathways
 
Metabolic network visualization - concepts
Metabolic network visualization - conceptsMetabolic network visualization - concepts
Metabolic network visualization - concepts
 
Use of open_linked_data_in_bioinformatics
Use of open_linked_data_in_bioinformaticsUse of open_linked_data_in_bioinformatics
Use of open_linked_data_in_bioinformatics
 
The Language of the Gene Ontology
The Language of the Gene OntologyThe Language of the Gene Ontology
The Language of the Gene Ontology
 
The Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resourcesThe Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resources
 
BITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS: Basics of sequence analysis
BITS: Basics of sequence analysis
 
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
 
SureChEMBL patent annotations in Open PHACTS
SureChEMBL patent annotations in Open PHACTSSureChEMBL patent annotations in Open PHACTS
SureChEMBL patent annotations in Open PHACTS
 
Overview of SureChEMBL
Overview of SureChEMBLOverview of SureChEMBL
Overview of SureChEMBL
 
Curatorial data wrangling for the Guide to PHARMACOLGY
Curatorial data wrangling for the Guide to PHARMACOLGY Curatorial data wrangling for the Guide to PHARMACOLGY
Curatorial data wrangling for the Guide to PHARMACOLGY
 
Standarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesStandarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata files
 
Metabolic Set Enrichment Analysis - chemrich - 2019
Metabolic Set Enrichment Analysis - chemrich - 2019Metabolic Set Enrichment Analysis - chemrich - 2019
Metabolic Set Enrichment Analysis - chemrich - 2019
 
Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontology
 
Update on the Druggable Proteome
Update on the Druggable ProteomeUpdate on the Druggable Proteome
Update on the Druggable Proteome
 
Collaboration with GeneGo provides seamless access to compound databases, pat...
Collaboration with GeneGo provides seamless access to compound databases, pat...Collaboration with GeneGo provides seamless access to compound databases, pat...
Collaboration with GeneGo provides seamless access to compound databases, pat...
 
SureChEMBL and Open PHACTS
SureChEMBL and Open PHACTSSureChEMBL and Open PHACTS
SureChEMBL and Open PHACTS
 
CINF 55: SureChEMBL: An open patent chemistry resource
CINF 55: SureChEMBL: An open patent chemistry resourceCINF 55: SureChEMBL: An open patent chemistry resource
CINF 55: SureChEMBL: An open patent chemistry resource
 

Similaire à Linking Linked Data CSHALS2013

Chem2bio2rdf portal
Chem2bio2rdf portalChem2bio2rdf portal
Chem2bio2rdf portalBin Chen
 
INFORMATICS 2.pptx
INFORMATICS 2.pptxINFORMATICS 2.pptx
INFORMATICS 2.pptxOramadevi1
 
Prediction of proteins for insecticidal activity using python toolkit iFeature
Prediction of proteins for insecticidal activity using python toolkit iFeaturePrediction of proteins for insecticidal activity using python toolkit iFeature
Prediction of proteins for insecticidal activity using python toolkit iFeatureKarnam Vasudeva Rao, PhD
 
Pistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS Foundation
Pistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS FoundationPistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS Foundation
Pistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS FoundationPistoia Alliance
 
Implementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSImplementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSValery Tkachenko
 
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeBioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeChunlei Wu
 
ReVeaLD: A User-driven Domain Specific Interactive Search Platform for Biomed...
ReVeaLD: A User-driven Domain Specific Interactive Search Platform for Biomed...ReVeaLD: A User-driven Domain Specific Interactive Search Platform for Biomed...
ReVeaLD: A User-driven Domain Specific Interactive Search Platform for Biomed...Maulik Kamdar
 
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_uploadProf. Wim Van Criekinge
 
PubChem as a resource for chemical information training
PubChem as a resource for chemical information trainingPubChem as a resource for chemical information training
PubChem as a resource for chemical information trainingSunghwan Kim
 
Exploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural productsExploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural productsSunghwan Kim
 
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...open_phacts
 

Similaire à Linking Linked Data CSHALS2013 (20)

Chem2bio2rdf portal
Chem2bio2rdf portalChem2bio2rdf portal
Chem2bio2rdf portal
 
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
 
INFORMATICS 2.pptx
INFORMATICS 2.pptxINFORMATICS 2.pptx
INFORMATICS 2.pptx
 
INFORMATICS 2.pptx
INFORMATICS 2.pptxINFORMATICS 2.pptx
INFORMATICS 2.pptx
 
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
 
ChemSpider – A Crowdsourcing Environment for Hosting and Validating Chemistry...
ChemSpider – A Crowdsourcing Environment for Hosting and Validating Chemistry...ChemSpider – A Crowdsourcing Environment for Hosting and Validating Chemistry...
ChemSpider – A Crowdsourcing Environment for Hosting and Validating Chemistry...
 
Prediction of proteins for insecticidal activity using python toolkit iFeature
Prediction of proteins for insecticidal activity using python toolkit iFeaturePrediction of proteins for insecticidal activity using python toolkit iFeature
Prediction of proteins for insecticidal activity using python toolkit iFeature
 
Pistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS Foundation
Pistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS FoundationPistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS Foundation
Pistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS Foundation
 
ChemSpider – An Online Database and Registration System Linking the Web
ChemSpider – An Online Database and  Registration System Linking the WebChemSpider – An Online Database and  Registration System Linking the Web
ChemSpider – An Online Database and Registration System Linking the Web
 
Implementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSImplementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTS
 
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeBioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
 
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
 
ReVeaLD: A User-driven Domain Specific Interactive Search Platform for Biomed...
ReVeaLD: A User-driven Domain Specific Interactive Search Platform for Biomed...ReVeaLD: A User-driven Domain Specific Interactive Search Platform for Biomed...
ReVeaLD: A User-driven Domain Specific Interactive Search Platform for Biomed...
 
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload
 
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
 
PubChem as a resource for chemical information training
PubChem as a resource for chemical information trainingPubChem as a resource for chemical information training
PubChem as a resource for chemical information training
 
Biopharmaceutical
Biopharmaceutical Biopharmaceutical
Biopharmaceutical
 
Exploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural productsExploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural products
 
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
 
Defence_5
Defence_5Defence_5
Defence_5
 

Linking Linked Data CSHALS2013

  • 1. Linking Linked Data Linked Data to Integrated Data Expert Bioinformatics from Bioinformatics Experts
  • 2. Put your data on the web make a pretty web site later. Expert Bioinformatics from Bioinformatics Experts
  • 3. Expert Bioinformatics from Bioinformatics Experts
  • 4. Now we can ask questions like this... What members of a target pathway are already targeted in other diseases? Target Pathway Disease Chembl Uniprot Reactome OMIM Protein Target Compound Pathway Disease Expert Bioinformatics from Bioinformatics Experts
  • 5. Because we have lots of data exposed as RDF Uniprot:Protein BioPAX:Protein Mim:Phenotype Expert Bioinformatics from Bioinformatics Experts
  • 6. What do you do when you have to add data... Expert Bioinformatics from Bioinformatics Experts
  • 7. Or connect SPARQL endpoints? RDF != Linked Data Expert Bioinformatics from Bioinformatics Experts
  • 8. Is your data 5* ? Linked data is essential to actually connect the semantic web. It is quite easy to do with a little thought, and becomes second nature. Various common sense considerations determine when to make a link and when not to. Expert Bioinformatics from Bioinformatics Experts
  • 9. Example openflydata to BioCyc What genes are differentially expressed in the hindgut and are there any pathways associated with those genes? ● Use FlyAtlas at openflydata.org for tissue specific expression profiles. ● Use FlyCyc from BioCyc. ● Then SPARQL Expert Bioinformatics from Bioinformatics Experts
  • 10. Problem: Node URIs <http://openflydata.org/id/flyatlas/affyid/1616608_a_at> <http://purl.org/NET/flyatlas/schema#gene> <http://openflydata.org/id/flybase/feature/FBgn0001128> . <http://biocyc.org/biopax/biopax-level3#UnificationXref202209> <http://www.biopax.org/release/biopax-level3.owl#xref> <http://biocyc.org/biopax/biopax-level3#Protein202210> . <http://biocyc.org/biopax/biopax-level3#UnificationXref202209> <http://www.biopax.org/release/biopax-level3.owl#db> FlyCyc . <http://biocyc.org/biopax/biopax-level3#UnificationXref202209> <http://www.biopax.org/release/biopax-level3.owl#id> FBGN0001128 . Expert Bioinformatics from Bioinformatics Experts
  • 11. Integration Level 1 Use Identifiers.org CONSTRUCT { ?x RDFS:seeAlso `bif:sprintf_iri ("http://identifiers.org/flybase/%s", ?id)` } WHERE { ?x BP:unificationxref ?xref . ?xref BP:id ?id . ?blank BP:db "FlyCyc"^^xsd:string } Expert Bioinformatics from Bioinformatics Experts
  • 12. Integration Level 2 adding property characteristics BP = <http://www.biopax.org/release/biopax-level3.owl#> BP:Protein BP:controls BP:Catalysis BP:Catalysis BP:controls BP:BioChemicalReaction BP:Protein BP:controls BP:BioChemicalReaction CONSTRUCT {?x GB:controlledBy ?y } WHERE { ?x BP:controls ?catalysis . ?catalysis BP:controls ?y } Expert Bioinformatics from Bioinformatics Experts
  • 13. Integration Level 3 class subsumption FlyA = <http://purl.org/NET/flyatlas/schema#> flywebflyatlas:1616608_a_at a flyatlas:ProbeData BP = <http://www.biopax.org/release/biopax-level3.owl#> flyatlas:ProbeData rdfs:subClassOf BP:DNARegion CONSTRUCT {?x a BP:DNARegion } WHERE { ?x a flyatlas:ProbeData } Expert Bioinformatics from Bioinformatics Experts
  • 14. Connect BiochemicalReactions to Expression Values SELECT ?name ?id ?mean WHERE { ?reaction a BP:BiochemicalReaction . ?reaction BP:standardName ?name . ?reaction GB:controlledBy ?protein . ?protein a BP:Protein . ?protein BP:xref ?id . ?probe a BP:DNARegion . ?probe BP:xref ?id . ?probe flyatlas:l_fatbody ?blank . ?blank flyatlas:mean ?mean } LIMIT 5 No Reasoner – just a few SPARQL CONSTRUCTs Expert Bioinformatics from Bioinformatics Experts
  • 15. Expert Bioinformatics from Bioinformatics Experts
  • 16. Client Architecture Expert Bioinformatics from Bioinformatics Experts
  • 17. Vocabularies in Linked Data What does the linked data cloud know about Drugs.... chembl:Activity chembl:Assay chembl:AssayCategory SELECT distinct ?class chembl:AssayTargetLink WHERE chembl:ChemicalCompound >100 chembl:DrugTarget { chembl:LiteratureCitation ?s a ?class . dailymed:drugs ?s ?p ?o drugbank:Drug } drugbank:DrugInteraction drugbank:EnzymeLink drugbank:ExternalIdentifier drugbank:ExternalLink drugbank:LiteratureCitation drugbank:Molecule drugbank:OrganismSpecies drugbank:Patent drugbank:ProteinSequence drugbank:TargetLink entrez:EnsemblReference entrez:Gene pdb:Molecule pdb:Structure pubmed:Chemical pubmed:Citation Expert Bioinformatics from Bioinformatics Experts pubmed:DatabankReference
  • 18. Create a tighter more unified “view” under one schema Expert Bioinformatics from Bioinformatics Experts
  • 19. Unified Vocabulary What does the linked data cloud know about Drugs.... Expert Bioinformatics from Bioinformatics Experts
  • 20. Map Classes and Properties into a single instantiated view Expert Bioinformatics from Bioinformatics Experts
  • 21. Before Query SELECT * WHERE { ?s drugb:calculatedInChIKey ?inchiD . ?s a drugb:Drug . ?c a Chembl:ChemicalCompund . ?c chembl:standardInChIKey ?inchiC . FILTER regex(?inchiD, ?inchiC) } Expert Bioinformatics from Bioinformatics Experts
  • 22. After Query SELECT * where { ?s a GB:Drug . ?s GB:inchiKey ?inchi . } Expert Bioinformatics from Bioinformatics Experts
  • 23. Linked Data Architecture Expert Bioinformatics from Bioinformatics Experts
  • 24. Creating fixed “views” of Linked Data When the use of integrated data is fixed e.g. an API or application, Linked Data can be expensive: – Changes to data requires significant recoding – Multiple Schemas make queries long and inefficient • A view or middle layer of data used by the API, changes to data are managed by the view and the API is minimally disturbed – Views are easier to query – Views are faster to query • Client gets the best of both worlds a tight view of data for API queries while still having all the advantages of a linked data strategy. Expert Bioinformatics from Bioinformatics Experts
  • 25. Summary ● Exposing data as RDF does not equal Linked Data ● Making data linked is not hard – Node IRI's – Unifying Classes – Transitive closure of Properties ● A little semantics goes a long way (no reasoner required) ● Creating “Views” from one schema to another is not hard. – But should be easier Expert Bioinformatics from Bioinformatics Experts
  • 26. www.generalbioinformatics.com/science.html Expert Bioinformatics from Bioinformatics Experts