Ontology Services for the Biomedical Sciences

Simon Jupp
Technical Coordinator / Ontology Project Lead
Samples, Phenotypes and Ontologies Team
EMBL-EBI European Bioinformatics Institute
Ontology services for connecting
biomedical data
Connected Data London,
October 4th, 2019

What is EMBL-EBI?
• Europe’s home for biological data services, research
and training
• A trusted data provider for the life sciences
• Part of the European Molecular Biology Laboratory,
an intergovernmental research organisation
• International: 650 members of staff from 66 nations

From molecules to medicine
We are always seeking new ways to read
and understand DNA
New technologies provide ways to collect,
compare and visualise molecular
information
Bioinformatics enables new applications:
• molecular medicine
• agriculture
• food
• environmental sciences

There‘s a lot of metadata...
tissues cell lines diseases

How many ways can you say “female”?
18-day pregnant females female (lactating) individual female worker caste (female)
2 yr old female female (pregnant) lgb*cc females sex: female
400 yr. old female female (outbred) mare female, other
adult female female parent female (worker) female child
asexual female female plant monosex female femal
castrate female female with eggs ovigerous female 3 female
cf.female female worker oviparous sexual females female (phenotype)
cystocarpic female female, 6-8 weeks old worker bee female mice
dikaryon female, virgin female enriched female, spayed
dioecious female female, worker pseudohermaprhoditic female femlale
diploid female female(gynoecious) remale metafemale
f femele semi-engorged female sterile female
famale female, pooled sexual oviparous female normal female
femail femalen sterile female worker sf
female females strictly female vitellogenic replete female
female - worker females only tetraploid female worker
female (alate sexual) gynoecious thelytoky hexaploid female
female (calf) healthy female female (gynoecious) female (f-o)
hen probably female (based on morphology)
female (note: this sample was originally provided as a "male" sample to us and therefore labeled this way in the brawand et al. paper
and original geo submission; however, detailed data analyses carried out in the meantime clearly show that this sample stems from a
female individual)",
Courtesy of N. Silvester, European Nucleotide Archive, EMBL-
EBI

Need for terminology standards
• Need to ensure we’re all talking about the same thing
• The biomedical science community has been busy
building ontologies and terminology standard
• Over 100 freely-available ontologies from the Open
Biological Ontology (OBO) community
• Most developed with formal semantics in OWL
• Many more terminology standards in use in biomedicine
Tibia?

EBI Ontologies Team
• Build services to make
ontologies accessible for
humans and machines
• Ensure a consistent set of
interoperable ontologies are
used across public datasets to
maximise interoperability
• Scale up the process to millions
of data points
• Work with software and
database developers to utilise
the ontologies
Data to knowledge

The end result is integrated data with
semantic search
Expression Atlas
GWAS catalog

Ontology driven search
• Semantic query across 20 integrated datasets to identify
potential new drug targets for disease
https://www.targetvalidation.org

Aligning data to our ontologies
Organism: Homo sapiens
cell type: Mast cell
Disease: Type II diabetes mellitus
Organism
part:
pancreas
CL:0000097
Cell type ontology
Where do you start?

Typical questions
• How do I access ontologies?
• How do I annotate data with ontologies?
• Which ontologies should I use?
• What about data that doesn’t map easily?
• How can I translate from one ontology to another?
• How can I extend an ontology?
• How do I build “ontology aware” applications?

The Ontology Toolkit
https://github.com/EBISPOT
Open Source Software
http://www.ebi.ac.uk/spot/ontology

Ontology Lookup Service
• Ontology search engine
• Ontology term history tracking
• Ontology visualisation
• RESTful API
Repository of over 200 pre-selected biomedical ontologies (5+ million terms)
http://www.ebi.ac.uk/ols
• Provides unified mechanism to access
multiple ontologies
• 6,000 users / 50 million hits per month

The problem with just an ontology lookup
…knowing what you’re looking for

Data annotation services
• Supporting data curation to map to the “right”
terms
• Based on what other databases are doing
• Collect mappings from 10 databases at EBI
and use as a training set to predict how new
unseen data should map to ontologies
http://www.ebi.ac.uk/spot/zooma
“Heart” UBERON:0000948
+ Context
(where, when?)

https://www.ebi.ac.uk/spot/zooma/

• Using previously curated data sources

• Using only ontologies
• Curators review output and feedback into Zooma
Reviewers

• We’re increasingly seeing data that is described using
ontologies
• But we don’t always agree on the ontologies to use
Datasource 1 Datasource 2
Human
Phenotype
Ontology
SNOMED-CTMappings
Ontology Mapping Service (OxO)
http://www.ebi.ac.uk/spot/oxo

Ontology Mapping Service (OxO)
• Graph database (Neo4j) of mappings from a number of public source
• Mappings are often semantically vague (exact, broader, narrower,
related)
• We use the graph to infer potential new mappings, and identify
conflicting sources of mappings
http://www.ebi.ac.uk/spot/oxo

Under the hood we use Neo4j
• We import OWL ontologies into Neo4j
• Simplify the OWL representation that is optimized for common queries
• Model for the application needs
• Scalable applications that are more developer friendly than triple stores

Powerful yet simple queries
• Get the full partonomy and classification of “heart” with
CYPHER
MATCH
(n)-[r:SUBCLASSOF|PARTOF*]->(parents)
WHERE n.label = “heart”
RETURN parents

Using ontologies in our search indexes
https://ebispot.github.io/BioSolr/
Enrich your search
index with ontology
goodness
• For text search we compute the closure of all
relationships into our text index

Semantic search and data integration with
ontologies
https://www.ebi.ac.uk/gwas

Publishing the data
• EBI RDF platform contains 7 EBI databases connected
by shared ontologies
• SPARQL access to a subset of EBI data
• But maintenance is hard as it’s not the source of truth for
the data
http://rdf.ebi.ac.uk

Aligning schemas to a single model is hard
Gene (via identifiers.
org/ensembl)
RNA transcript (via
identifiers.org/ensembl)
uniprot:Protein
rdfs:seeAlso (not currently linking
to identifiers.org but soon)
discretized differential
gene expression ratio
(sio: SIO_001078)
Gene Expression Atlas
Ensembl
sio:'is attribute of'
(sio:SIO_000011)
Uniprot
Gene Ontology
GO BP GO MF GO CC
uniprot:classifiedWith
bq:occursIn
Organisms
Organism/taxon
ChEMBL
Assay
(?)
chem
bl:hasTarget
?
bq:isVersionOf
uniprot:organism
rdfs:seeAlso
1
1
1
*
1
* * *
1
1
BioModels
SBMLModel
Reaction
Species
Compartment
bq:is
bq:isVersionOf
bq:isVersionOf
bq:is
bq:isVersionOf
bq:isHomologTo
bq:hasPart
ChEBI
Reactome
Pathway
bq:isVersionOf
bq:isVersionOf
SBO
bq:is
Relationships within
Biomodels can be found
at https://github.
com/sarala/ricordo-
rdfconverter/wiki/SBML-
RDF-Schema
rdfs:seeAlso
Structure
PDB
1
rdfs:seeAlso
Target (?)
uniprot:transcribedFrom
Protein (via identifiers.
org/ensembl)
uniprot:translatedTo
bq:isVersionOf
Genes
Drugs
Species
Protein
Protein Structure
Reactions
Gene function Systems
Disease

Is JSON-LD the answer?
e.g. Most services produce JSON via REST
API

Slight tweak to make RDF compatible
"@context" : {
"@vocab" : "http://rdf.ebi.ac.uk/terms/ensembl/",
"obo" : "http://purl.obolibrary.org/obo/",
"dcterms" : "http://purl.org/dc/terms/",
"faldo" : "http://biohackathon.org/resource/faldo#",
"biotype" : {
"@id" : "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"@type" : "@vocab"
},
"protein_coding" : "obo:SO_0001217",
"id" : "dcterms:identifier",
"homo_sapiens" : "http://identifiers.org/taxonomy/9606",
"species" : {
"@id" : "obo:OBO_0100026",
"@type" : "@vocab"
},
"description" : "dcterms:description",
"display_name" : "http://www.w3.org/2004/02/skos/core#prefLabel"
}
Using JSON-LD
to assign
ontology
semantics to
existing data

Ensembl JSON as RDF triples
"@context" : {
"@vocab" : "http://rdf.ebi.ac.uk/terms/ensembl/",
"obo" : "http://purl.obolibrary.org/obo/",
"dcterms" : "http://purl.org/dc/terms/",
"faldo" : "http://biohackathon.org/resource/faldo#",
"biotype" : {
"@id" : "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"@type" : "@vocab"
},
"protein_coding" : "obo:SO_0001217",
"id" : "dcterms:identifier",
"homo_sapiens" : "http://identifiers.org/taxonomy/9606",
"species" : {
"@id" : "obo:OBO_0100026",
"@type" : "@vocab"
},
"description" : "dcterms:description",
"display_name" : "http://www.w3.org/2004/02/skos/core#prefLabel"
}

BioSchemas & Schema.org
• Low cost investment (markup in HTML)
• Community growing for Life science
• http://bioschemas.org
• JSON-LD emerging as popular microformat language
• EBI BioSamples database has over 10 million pages
marked up with semantic markup
• Great potential for datasets discovery (finding data
generated from the same samples)
• But not clear who will do the crawling and build the
indexes…

What we’ve learnt along the way
• The data we see is getting better as the ontologies have matured and
consensus has grown around which ontologies should be used
• Crowdsourcing through tools like Zooma and OxO has good economies of
scale with respect to data curation
• Retrofitting the semantics in this way has limits, there’s still a long tail of data
that we miss.
• OWL semantics are essential for building and maintaining our ontologies, but
we’ve had to devise custom ways to utilise the ontologies when building
applications and populating databases
• Developers want more conventional access to semantics (i.e. REST+JSON)

Ontology team
Helen Parkinson
Warren ReadOla Ajigboye
Funding
• EMBL and OpenTargets
• CORBEL This project receives funding from the
European Union’s Horizon 2020 research and
innovation programme under grant agreement No
654248.
• EJP cofund
• EOSC-Life
• EXCELERATE ELIXIR-EXCELERATE is funded by
the European Commission within the Research
Infrastructures programme of Horizon 2020, grant
agreement number 676559.
• Funding for Human Cell Atlas from Chan-Zuckerberg
Initiative
Paola Roncaglia Henriette Harmse
Simon Jupp
Zoe Pendlington
Nicolas Matentzoglu
David Osumi-Sutherland

Ontology Services for the Biomedical Sciences

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Ontology Services for the Biomedical Sciences

Similaire à Ontology Services for the Biomedical Sciences (20)

Plus de Connected Data World

Plus de Connected Data World (20)

Dernier

Dernier (20)

Ontology Services for the Biomedical Sciences