4. Now we can ask questions like this...
What members of a target pathway are already targeted in other diseases?
Target Pathway Disease
Chembl Uniprot Reactome OMIM
Protein
Target
Compound
Pathway
Disease
Expert Bioinformatics from Bioinformatics Experts
5. Because we have lots of data exposed
as RDF
Uniprot:Protein
BioPAX:Protein
Mim:Phenotype
Expert Bioinformatics from Bioinformatics Experts
6. What do you do when you have to add
data...
Expert Bioinformatics from Bioinformatics Experts
7. Or connect SPARQL endpoints?
RDF != Linked Data
Expert Bioinformatics from Bioinformatics Experts
8. Is your data 5* ?
Linked data is essential to
actually connect the semantic
web. It is quite easy to do with
a little thought, and becomes
second nature. Various
common sense considerations
determine when to make a link
and when not to.
Expert Bioinformatics from Bioinformatics Experts
9. Example openflydata to BioCyc
What genes are differentially expressed in the hindgut and are there any
pathways associated with those genes?
● Use FlyAtlas at openflydata.org for tissue specific expression profiles.
● Use FlyCyc from BioCyc.
● Then SPARQL
Expert Bioinformatics from Bioinformatics Experts
13. Integration Level 3
class subsumption
FlyA = <http://purl.org/NET/flyatlas/schema#>
flywebflyatlas:1616608_a_at a flyatlas:ProbeData
BP = <http://www.biopax.org/release/biopax-level3.owl#>
flyatlas:ProbeData rdfs:subClassOf BP:DNARegion
CONSTRUCT {?x a BP:DNARegion }
WHERE
{ ?x a flyatlas:ProbeData }
Expert Bioinformatics from Bioinformatics Experts
14. Connect BiochemicalReactions to
Expression Values
SELECT ?name ?id ?mean
WHERE
{
?reaction a BP:BiochemicalReaction .
?reaction BP:standardName ?name .
?reaction GB:controlledBy ?protein .
?protein a BP:Protein .
?protein BP:xref ?id .
?probe a BP:DNARegion .
?probe BP:xref ?id .
?probe flyatlas:l_fatbody ?blank .
?blank flyatlas:mean ?mean
}
LIMIT 5
No Reasoner – just a few SPARQL CONSTRUCTs
Expert Bioinformatics from Bioinformatics Experts
24. Creating fixed “views” of Linked Data
When the use of integrated data is fixed e.g. an API or
application, Linked Data can be expensive:
– Changes to data requires significant recoding
– Multiple Schemas make queries long and inefficient
• A view or middle layer of data used by the API, changes to
data are managed by the view and the API is minimally
disturbed
– Views are easier to query
– Views are faster to query
• Client gets the best of both worlds a tight view of data for
API queries while still having all the advantages of a linked
data strategy.
Expert Bioinformatics from Bioinformatics Experts
25. Summary
● Exposing data as RDF does not equal Linked Data
● Making data linked is not hard
– Node IRI's
– Unifying Classes
– Transitive closure of Properties
● A little semantics goes a long way (no reasoner required)
● Creating “Views” from one schema to another is not hard.
– But should be easier
Expert Bioinformatics from Bioinformatics Experts