Linking Linked Data CSHALS2013

Linking Linked Data
Linked Data to Integrated Data

Expert Bioinformatics from Bioinformatics Experts

Put your data on the web
make a pretty web site later.


Now we can ask questions like this...
What members of a target pathway are already targeted in other diseases?

Target Pathway Disease

Chembl Uniprot Reactome OMIM

Protein
Target

Compound

Pathway
Disease


Because we have lots of data exposed
as RDF
Uniprot:Protein
BioPAX:Protein

Mim:Phenotype


What do you do when you have to add
data...


Or connect SPARQL endpoints?

RDF != Linked Data


Is your data 5* ?

Linked data is essential to
actually connect the semantic
web. It is quite easy to do with
a little thought, and becomes
second nature. Various
common sense considerations
determine when to make a link
and when not to.


Example openflydata to BioCyc
What genes are differentially expressed in the hindgut and are there any
pathways associated with those genes?
● Use FlyAtlas at openflydata.org for tissue specific expression profiles.

● Use FlyCyc from BioCyc.

● Then SPARQL


Problem: Node URIs
<http://openflydata.org/id/flyatlas/affyid/1616608_a_at>
<http://purl.org/NET/flyatlas/schema#gene>
<http://openflydata.org/id/flybase/feature/FBgn0001128> .

<http://biocyc.org/biopax/biopax-level3#UnificationXref202209>
<http://www.biopax.org/release/biopax-level3.owl#xref>
<http://biocyc.org/biopax/biopax-level3#Protein202210>
.
<http://www.biopax.org/release/biopax-level3.owl#db>
FlyCyc
.
<http://www.biopax.org/release/biopax-level3.owl#id>
FBGN0001128
.


Integration Level 1
Use Identifiers.org
CONSTRUCT {
?x
RDFS:seeAlso
`bif:sprintf_iri ("http://identifiers.org/flybase/%s", ?id)`
}
WHERE {
?x BP:unificationxref ?xref .
?xref BP:id ?id .
?blank BP:db "FlyCyc"^^xsd:string
}


Integration Level 2
adding property characteristics

BP = <http://www.biopax.org/release/biopax-level3.owl#>

BP:Protein BP:controls BP:Catalysis

BP:Catalysis BP:controls BP:BioChemicalReaction

BP:Protein BP:controls BP:BioChemicalReaction

CONSTRUCT {?x GB:controlledBy ?y }
WHERE

{ ?x BP:controls ?catalysis .
?catalysis BP:controls ?y }


Integration Level 3
class subsumption

FlyA = <http://purl.org/NET/flyatlas/schema#>

flywebflyatlas:1616608_a_at a flyatlas:ProbeData

BP = <http://www.biopax.org/release/biopax-level3.owl#>

flyatlas:ProbeData rdfs:subClassOf BP:DNARegion

CONSTRUCT {?x a BP:DNARegion }
WHERE

{ ?x a flyatlas:ProbeData }


Connect BiochemicalReactions to
Expression Values
SELECT ?name ?id ?mean
WHERE
{
?reaction a BP:BiochemicalReaction .
?reaction BP:standardName ?name .
?reaction GB:controlledBy ?protein .
?protein a BP:Protein .
?protein BP:xref ?id .
?probe a BP:DNARegion .
?probe BP:xref ?id .
?probe flyatlas:l_fatbody ?blank .
?blank flyatlas:mean ?mean
}
LIMIT 5

No Reasoner – just a few SPARQL CONSTRUCTs


Client Architecture


Vocabularies in Linked Data
What does the linked data cloud know about Drugs....
chembl:Activity
chembl:Assay
chembl:AssayCategory
SELECT distinct ?class chembl:AssayTargetLink
WHERE chembl:ChemicalCompound
>100 chembl:DrugTarget
{ chembl:LiteratureCitation
?s a ?class . dailymed:drugs
?s ?p ?o drugbank:Drug
} drugbank:DrugInteraction
drugbank:EnzymeLink
drugbank:ExternalIdentifier
drugbank:ExternalLink
drugbank:LiteratureCitation
drugbank:Molecule
drugbank:OrganismSpecies
drugbank:Patent
drugbank:ProteinSequence
drugbank:TargetLink
entrez:EnsemblReference
entrez:Gene
pdb:Molecule
pdb:Structure
pubmed:Chemical
pubmed:Citation
pubmed:DatabankReference

Create a tighter more unified “view” under
one schema


Unified Vocabulary
What does the linked data cloud know about Drugs....


Map Classes and Properties into a
single instantiated view


Before Query
SELECT *
WHERE
{
?s drugb:calculatedInChIKey ?inchiD .
?s a drugb:Drug .
?c a Chembl:ChemicalCompund .
?c chembl:standardInChIKey ?inchiC .
FILTER regex(?inchiD, ?inchiC)
}


After Query

SELECT *
where
{
?s a GB:Drug .
?s GB:inchiKey ?inchi .
}


Linked Data Architecture


Creating fixed “views” of Linked Data
When the use of integrated data is fixed e.g. an API or
application, Linked Data can be expensive:
– Changes to data requires significant recoding
– Multiple Schemas make queries long and inefficient
• A view or middle layer of data used by the API, changes to
data are managed by the view and the API is minimally
disturbed
– Views are easier to query
– Views are faster to query
• Client gets the best of both worlds a tight view of data for
API queries while still having all the advantages of a linked
data strategy.


Summary
● Exposing data as RDF does not equal Linked Data
● Making data linked is not hard
– Node IRI's
– Unifying Classes
– Transitive closure of Properties
● A little semantics goes a long way (no reasoner required)
● Creating “Views” from one schema to another is not hard.
– But should be easier


www.generalbioinformatics.com/science.html


Linking Linked Data CSHALS2013

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Linking Linked Data CSHALS2013

Similaire à Linking Linked Data CSHALS2013 (20)

Linking Linked Data CSHALS2013