This document outlines a roadmap for making BioPortal, an ontology repository, more multilingual. It proposes representing the natural language of ontologies, distinguishing between monolingual and multilingual ontologies, and representing relations between translated ontologies. It also discusses reconciling multilingual mappings and representing multilingual content overall. Making BioPortal multilingual could enable new discoveries by connecting data in different languages and addressing cultural differences reflected in language. Some challenges include dealing with partial multilingual ontologies and multiple mappings between terms. The proposals aim to better support multilingual ontologies, translations, and mappings within BioPortal.
1. Roadmap for a
multilingual BioPortal
Clement Jonquet (jonquet@lirmm.fr),
Vincent Emonet (emonet@lirmm.fr) &
Mark A. Musen (musen@stanford.edu)
4th workshop on the Multilingual Semantic Web
Portoroz, Slovenia – June 1st 2015
3. Context:
increasing number of biomedical data
+ multilingualism
Limits of keyword-based indexing
Biomedical community has turned to ontologies to describe their
data and turn them into structured and formalized knowledge
Using ontologies is by means of creating semantic annotations
Crucial need for tools & services for French biomedical data
Biomedical data integration challenge
New potential sceintific discoveries hidden in data
Translational research
4th workshop on the Multilingual
Semantic Web
4. Biologist have adopted
ontologies
To provide canonical representation of scientific
knowledge
To annotate experimental data to enable
interpretation, comparison, and discovery across
databases
To facilitate knowledge-based applications for
Decision support
Natural language-processing
Data integration
But ontologies are: spread out, in different formats, of
different size, with different structures
In different “languages”
4th workshop on the Multilingual
Semantic Web
5. Working with terminologies &
ontologies – a portal please!
You’ve built an ontology, how do you let the world know?
You need an ontology, where do you go o get it?
How do you know whether an ontology is any good?
How do you find resources that are relevant to the
domain of the ontology (or to specific terms)?
How could you leverage your ontology to enable new
science?
How could you use ontologies without managing them ?
4th workshop on the Multilingual
Semantic Web
6. A few words about
BioPortal
4th workshop on the Multilingual
Semantic Web
7. Bioportal : A “one stop shop”
for Biomedical Ontologies
Web repository for biomedical ontologies
Make ontologies accessible and usable – abstraction on
format, locations, structure, etc.
Users can publish, download, browse, search, comment,
align ontologies and use them for annotations both online
and via a web services API.
Online support for ontology
Peer review
Notes (comments and discussion)
Versioning
Mapping
Search
Resources
4th workshop on the Multilingual
Semantic Web
9. http://data.bioontology.org
Ontology
Services
• Search
• Traverse
• Comment
• Download
Widgets
• Tree-view
• Auto-complete
• Graph-view
Annotation
Data Access
Mapping
Services
• Create
• Upload
• Download
Term recognition
Search “data”
annotated with a
given term
http://bioportal.bioontology.org 4th workshop on the Multilingual
Semantic Web
10. Status of multilingualism in
BioPortal
Does accept (and parse) both multilingual ontologies and
monolingual ontologies
sometime represented as views
No leveraging of multilingual structure and content
inclusion/exclusion of labels in different languages in the use of the
services the portal offers e.g., Annotator
No t capable to reconcile and deal with the multilingual mappings
Not use a proper mechanism to identify the language property(ies)
of an ontology
Not support relationships between ontologies in different languages
(or in general)
Does not support any internationalization.
whole UI exists only in English
4th workshop on the Multilingual
Semantic Web
11. A few words about words
4th workshop on the Multilingual
Semantic Web
12. multilingual
ontology
4th workshop on the Multilingual
Semantic Web
en:disease
fr:maladie
...
en:cancer
fr:cancer
en:spindel cell sarcome
fr:sarcome à cellules fusiformes
en:melanoma
fr:mélanome
disease
... cancer
spindle cell sarcome melanoma
maladie
... cancer
sarcome à cellules
fusiformes
mélanome
language specific
ontology
(monolingual)
13. Ontology language &
translation
Natural language = the language (French, English, Spanish,
etc.) used when building a language specific ontology
Format language = used to describe the ontology (OWL,
RDFS, RRF, etc.)
Translation = relation between two language specific
ontologies that represent mainly the same object (domain,
topics, set of concepts and relations)
4th workshop on the Multilingual
Semantic Web
14. Multilingual mappings
Mapping (or alignment) = a correspondence between
concepts in different ontologies
Multilingual mapping = a concept mapping between 2
language specific ontologies
Multilingual translation mapping = additionally the 2
concerned language specific ontologies are a translation of
one another
For instance,
Mesh/melanoma has a mapping to DOID/melanoma
Mesh-fr/mélanome has multilingual mapping to DOID/melanoma
Mesh/melanoma has a multilingual translation mapping to Mesh-
fr/mélanome
4th workshop on the Multilingual
Semantic Web
15. What is being multilingual?
Interface internationalization = displaying static elements of
the user interface (e.g., menu names, help, etc.) in
different languages
Content internationalization = displaying BioPortal content
(e.g., ontology labels, mappings, etc.) in different languages
Multilingual = internationalization (display) + to enabling a
complete use of the functionalities and services of BioPortal
for multilingual ontologies or monolingual ontologies
completely and properly addressed (languages, translations,
multilingual mappings, etc.)
rich semantic description
4th workshop on the Multilingual
Semantic Web
16. A few propositions for
multilingual BioPortal
4th workshop on the Multilingual
Semantic Web
17. Representation of natural
language property for an ontology
Reuse the OMV (http://omv2.sourceforge.net) is already
imported and used in BioPortal Metadata ontology
(http://bioportal.bioontology.org/ontologies/BP-METADATA)
omv:naturalLanguage
4th workshop on the Multilingual
Semantic Web
18. Representation of the distinction
between ontologies
Extend OMV within BioPortal Metadata to include and
formalize the distinction
4th workshop on the Multilingual
Semantic Web
meta:MultilingualOntology
rdfs:subClassOf omv:Ontology
omv:naturalLanguage some Literal
meta:LanguageSpecificOntology
rdfs:subClassOf omv:Ontology
omv:naturalLanguage exactly 1 literal
19. Representation of relation
between ontologies
Extend the DOOR ontology (http://kannel.kmi.open.ac.uk)
A translated ontology is a specific evolution of the ontology with
a different syntax (an equivalent ontology but in another
language)
new property in BioPortal metadata
4th workshop on the Multilingual
Semantic Web
20. 4th workshop on the Multilingual
Semantic Web
meta:isTranslationOf
21. Representation of
multilingual mappings
Keep a single and simple model as the one BioPortal already
provides to represent any mappings
as any other mapping, but with a specific relation (non exclusive)
Reuse standard properties to represent translations
the LEMON translation module (direct|cultural|lexicalEquivalent)
the GOLD ontology (free|literalTranslation)
4th workshop on the Multilingual
Semantic Web
disease
... cancer
spindle cell sarcome melanoma
maladie
... cancer
sarcome à cellules
fusiformes
mélanome
gold:freeTranslation
gold:literalTranslation
22. Reconciliation of multilingual
mappings
Methods to extract multilingual (translation) mappings
between (translated) ontologies and then reconcile them
into BioPortal mapping repository
Approaches
Via term code when they are the same
Extraction from a meta-thesaurus such as UMLS
Extraction from external mapping databases e.g. CISMEF
Using existing monolingual mappings
Using language parallel data resources
Etc.
4th workshop on the Multilingual
Semantic Web
24. A few elements of
discussion
4th workshop on the Multilingual
Semantic Web
25. Important for the Web of
tomorrow
Multilingualism is an important issue in the explosion of data
being released and linked over the Web today
The vision of the semantic web is to be able to leverage and
interoperate data whatever natural language these data is
available into
Make ontology repository multilingual and thus making
ontologies inside the repositories multilingual
4th workshop on the Multilingual
Semantic Web
26. Language reflects cultural
difference
An ontology corresponds to an interpretation of a certain
reality done by a group of people at a certain time
Language => cultural differences => conceptual differences
When the sociological and cultural differences are important,
the effect on the knowledge formalized is also important
4th workshop on the Multilingual
Semantic Web
traitement de
données
transfer de
données
téléchargement
data
process
data
transfer
upload download sideload
27. What is the challenge?
Multilingual translational discoveries
Potential discoveries that would become possible by crossing
large amount of (clinical) data about population of different
ethnics and continental origins currently expressed and
limited to a unique natural language
e.g. multilingual crossing of genotype-phenotype distinction
studies to help understanding better the role of the
environment on gene expression
4th workshop on the Multilingual
Semantic Web
28. Remaining open questions
How to deal with partial multilingual ontology?
How to deal more than one-to-one mapping?
download/upload vs. télecharger
Formalize entailment of these new classes and properties
e.g., a multilingual translation mapping is a multilingual mapping
connecting 2 ontologies that are a translation one of the other
Make BioPortal ontology parser deals with lexical enrichment
vocabularies
SKOS-XL, LIR, LexINfo, Lexvo, Lingvoj => LEMON
LEMON translation module (Jan 2014)
4th workshop on the Multilingual
Semantic Web
29. Conclusions
Multilingual semantic Web is crucial
Propositions to manage multilingualism in an ontology
repository such like BioPortal
Deal with monolingual ontologies and translation mappings
Deal with multilingual ontologies (from xmllang to LEMON)
Within the SIFR project, we are implementing and test those
propositions in a local instance of BioPortal deployed at
LIRMM
4th workshop on the Multilingual
Semantic Web
Thank you.
Any questions?