Ontologies and Semantic Web technologies play an important role in the life sciences to help make data more interoperable and reusable. There are now many publicly available ontologies that enable biologists to describe everything from gene function through to animal physiology and disease.
Various efforts such as the Open Biomedical Ontologies (OBO) foundry provide central registries for biomedical ontologies and ensure they remain interoperable through a set of common shared development principles.
At EMBL-EBI we contribute to the development of biomedical ontologies and make extensive use of them in the annotation of public datasets. Biological data typically comes with rich and often complex metadata, so the ontologies provide a standard way to capture “what the data is about” and gives us hooks to connect to more data about similar things.
These ontology annotations have been put to good use in a number of large-scale data integration efforts and there’s an increasing recognition of the need for ontologies in making data FAIR (Findable, Accessible, Interoperable and Reusable).
EMBL-EBI build a number of integrative data platforms where ontologies are at the core of our domain models. One example is the Open Targets platform, where data about disease from 18 different databases can be aggregated and grouped based on therapeutic areas in the ontology and used to identify potential drug targets.
The ontologies team at EMBL-EBI provide a suite of services that are aimed at making ontologies more accessible for both humans and machines. We work with scientific data curators and software developers to integrate ontologies and semantics into both the data generation and data presentation workflows. We provide:
– An ontology lookup service (OLS) that provides search and visualisation services to over 200+ ontologies
– Services for automating the annotation of metadata and learning from previous annotations (Zooma)
– An ontology mapping and alignment service (OXO)
– Tools for working with metadata and ontologies in spreadsheets (Webulous)
– Software for enriching documents in search engines to support “semantic” query expansion
I’ll present how we are using these services at EMBL-EBI to scale up the semantic annotation of metadata. I’ll talk about our open source technology stack and describe how we utilise a polyglot persistence approach (graph databases, triples stores, document stores etc) to optimize how we deliver ontologies and semantics to our users.
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Ontology Services for the Biomedical Sciences
1. Simon Jupp
Technical Coordinator / Ontology Project Lead
Samples, Phenotypes and Ontologies Team
EMBL-EBI European Bioinformatics Institute
Ontology services for connecting
biomedical data
Connected Data London,
October 4th, 2019
2. What is EMBL-EBI?
• Europe’s home for biological data services, research
and training
• A trusted data provider for the life sciences
• Part of the European Molecular Biology Laboratory,
an intergovernmental research organisation
• International: 650 members of staff from 66 nations
3. From molecules to medicine
We are always seeking new ways to read
and understand DNA
New technologies provide ways to collect,
compare and visualise molecular
information
Bioinformatics enables new applications:
• molecular medicine
• agriculture
• food
• environmental sciences
5. There‘s a lot of metadata...
tissues cell lines diseases
6. How many ways can you say “female”?
18-day pregnant females female (lactating) individual female worker caste (female)
2 yr old female female (pregnant) lgb*cc females sex: female
400 yr. old female female (outbred) mare female, other
adult female female parent female (worker) female child
asexual female female plant monosex female femal
castrate female female with eggs ovigerous female 3 female
cf.female female worker oviparous sexual females female (phenotype)
cystocarpic female female, 6-8 weeks old worker bee female mice
dikaryon female, virgin female enriched female, spayed
dioecious female female, worker pseudohermaprhoditic female femlale
diploid female female(gynoecious) remale metafemale
f femele semi-engorged female sterile female
famale female, pooled sexual oviparous female normal female
femail femalen sterile female worker sf
female females strictly female vitellogenic replete female
female - worker females only tetraploid female worker
female (alate sexual) gynoecious thelytoky hexaploid female
female (calf) healthy female female (gynoecious) female (f-o)
hen probably female (based on morphology)
female (note: this sample was originally provided as a "male" sample to us and therefore labeled this way in the brawand et al. paper
and original geo submission; however, detailed data analyses carried out in the meantime clearly show that this sample stems from a
female individual)",
Courtesy of N. Silvester, European Nucleotide Archive, EMBL-
EBI
7. Need for terminology standards
• Need to ensure we’re all talking about the same thing
• The biomedical science community has been busy
building ontologies and terminology standard
• Over 100 freely-available ontologies from the Open
Biological Ontology (OBO) community
• Most developed with formal semantics in OWL
• Many more terminology standards in use in biomedicine
Tibia?
8. EBI Ontologies Team
• Build services to make
ontologies accessible for
humans and machines
• Ensure a consistent set of
interoperable ontologies are
used across public datasets to
maximise interoperability
• Scale up the process to millions
of data points
• Work with software and
database developers to utilise
the ontologies
Data to knowledge
9. The end result is integrated data with
semantic search
Expression Atlas
GWAS catalog
10. Ontology driven search
• Semantic query across 20 integrated datasets to identify
potential new drug targets for disease
https://www.targetvalidation.org
11. Aligning data to our ontologies
Organism: Homo sapiens
cell type: Mast cell
Disease: Type II diabetes mellitus
Organism
part:
pancreas
CL:0000097
Cell type ontology
Where do you start?
12. Typical questions
• How do I access ontologies?
• How do I annotate data with ontologies?
• Which ontologies should I use?
• What about data that doesn’t map easily?
• How can I translate from one ontology to another?
• How can I extend an ontology?
• How do I build “ontology aware” applications?
14. Ontology Lookup Service
• Ontology search engine
• Ontology term history tracking
• Ontology visualisation
• RESTful API
Repository of over 200 pre-selected biomedical ontologies (5+ million terms)
http://www.ebi.ac.uk/ols
• Provides unified mechanism to access
multiple ontologies
• 6,000 users / 50 million hits per month
16. The problem with just an ontology lookup
…knowing what you’re looking for
17. Data annotation services
• Supporting data curation to map to the “right”
terms
• Based on what other databases are doing
• Collect mappings from 10 databases at EBI
and use as a training set to predict how new
unseen data should map to ontologies
http://www.ebi.ac.uk/spot/zooma
“Heart” UBERON:0000948
+ Context
(where, when?)
19. • Using previously curated data sources
https://www.ebi.ac.uk/spot/zooma/
20. • Using only ontologies
• Curators review output and feedback into Zooma
https://www.ebi.ac.uk/spot/zooma/
Reviewers
21. • We’re increasingly seeing data that is described using
ontologies
• But we don’t always agree on the ontologies to use
Datasource 1 Datasource 2
Human
Phenotype
Ontology
SNOMED-CTMappings
Ontology Mapping Service (OxO)
http://www.ebi.ac.uk/spot/oxo
22. Ontology Mapping Service (OxO)
• Graph database (Neo4j) of mappings from a number of public source
• Mappings are often semantically vague (exact, broader, narrower,
related)
• We use the graph to infer potential new mappings, and identify
conflicting sources of mappings
http://www.ebi.ac.uk/spot/oxo
23. Under the hood we use Neo4j
• We import OWL ontologies into Neo4j
• Simplify the OWL representation that is optimized for common queries
• Model for the application needs
• Scalable applications that are more developer friendly than triple stores
24. Powerful yet simple queries
• Get the full partonomy and classification of “heart” with
CYPHER
MATCH
(n)-[r:SUBCLASSOF|PARTOF*]->(parents)
WHERE n.label = “heart”
RETURN parents
25. Using ontologies in our search indexes
https://ebispot.github.io/BioSolr/
Enrich your search
index with ontology
goodness
• For text search we compute the closure of all
relationships into our text index
26. Semantic search and data integration with
ontologies
https://www.ebi.ac.uk/gwas
27. Publishing the data
• EBI RDF platform contains 7 EBI databases connected
by shared ontologies
• SPARQL access to a subset of EBI data
• But maintenance is hard as it’s not the source of truth for
the data
http://rdf.ebi.ac.uk
28. Aligning schemas to a single model is hard
Gene (via identifiers.
org/ensembl)
RNA transcript (via
identifiers.org/ensembl)
uniprot:Protein
rdfs:seeAlso (not currently linking
to identifiers.org but soon)
discretized differential
gene expression ratio
(sio: SIO_001078)
Gene Expression Atlas
Ensembl
sio:'is attribute of'
(sio:SIO_000011)
Uniprot
Gene Ontology
GO BP GO MF GO CC
uniprot:classifiedWith
bq:occursIn
Organisms
Organism/taxon
ChEMBL
Assay
(?)
chem
bl:hasTarget
?
bq:isVersionOf
uniprot:organism
rdfs:seeAlso
1
1
1
*
1
* * *
1
1
BioModels
SBMLModel
Reaction
Species
Compartment
bq:is
bq:isVersionOf
bq:isVersionOf
bq:is
bq:isVersionOf
bq:isHomologTo
bq:hasPart
ChEBI
Reactome
Pathway
bq:isVersionOf
bq:isVersionOf
SBO
bq:is
Relationships within
Biomodels can be found
at https://github.
com/sarala/ricordo-
rdfconverter/wiki/SBML-
RDF-Schema
rdfs:seeAlso
Structure
PDB
1
rdfs:seeAlso
Target (?)
uniprot:transcribedFrom
Protein (via identifiers.
org/ensembl)
uniprot:translatedTo
bq:isVersionOf
Genes
Drugs
Species
Protein
Protein Structure
Reactions
Gene function Systems
Disease
29. Is JSON-LD the answer?
e.g. Most services produce JSON via REST
API
33. BioSchemas & Schema.org
• Low cost investment (markup in HTML)
• Community growing for Life science
• http://bioschemas.org
• JSON-LD emerging as popular microformat language
• EBI BioSamples database has over 10 million pages
marked up with semantic markup
• Great potential for datasets discovery (finding data
generated from the same samples)
• But not clear who will do the crawling and build the
indexes…
34. What we’ve learnt along the way
• The data we see is getting better as the ontologies have matured and
consensus has grown around which ontologies should be used
• Crowdsourcing through tools like Zooma and OxO has good economies of
scale with respect to data curation
• Retrofitting the semantics in this way has limits, there’s still a long tail of data
that we miss.
• OWL semantics are essential for building and maintaining our ontologies, but
we’ve had to devise custom ways to utilise the ontologies when building
applications and populating databases
• Developers want more conventional access to semantics (i.e. REST+JSON)
35. Ontology team
Helen Parkinson
Warren ReadOla Ajigboye
Funding
• EMBL and OpenTargets
• CORBEL This project receives funding from the
European Union’s Horizon 2020 research and
innovation programme under grant agreement No
654248.
• EJP cofund
• EOSC-Life
• EXCELERATE ELIXIR-EXCELERATE is funded by
the European Commission within the Research
Infrastructures programme of Horizon 2020, grant
agreement number 676559.
• Funding for Human Cell Atlas from Chan-Zuckerberg
Initiative
Paola Roncaglia Henriette Harmse
Simon Jupp
Zoe Pendlington
Nicolas Matentzoglu
David Osumi-Sutherland