The document discusses the use of ontologies and taxonomies to enhance findability, accessibility, interoperability, and reuse of data and resources. It provides definitions for taxonomy, ontology, knowledge engineering, and artificial intelligence. It describes how ontologies can specify terminology, concepts, and relationships in a domain to provide a rich description. The document also discusses ontology development processes and gives examples of how ontologies can enable semantic search, data integration, and interpretation across different studies and data sources.
1. ‘Smart’ Taxonomy- & Ontology-
Enabled Resources
Deborah L. McGuinness
Tetherless World Senior Constellation Chair
Professor of Computer, Cognitive, and Web Science
Director RPI Web Science Research Center
RPI Institute for Data Exploration and Application Health Informatics Lead
dlm@cs.rpi.edu , @dlmcguinness
Taxonomy Bootcamp Washington, DC November 5, 2018
3. Vocabularies, Taxonomies,
Ontologies… are Everywhere
• Hand constructed over years
• Generated from automated techniques,
Machine Learning, Information Extraction, …
• Spreadsheet schemas
• Navigation bar terms
• …..
• And many people and services want/need to
interconnect using them
• Someone needs to work with them
4. • Limited data integration without controlled
vocabulary
• Limited reproducibility without shared
definitions
• Difficulty in reuse without provenance
Ontologies can enhance findabiity,
accessibility, interoperability, reusability and
research impact – supporting go-fair
https://www.go-fair.org/
Vocabularies Enabling “FAIRness”
McGuinness 11/18
5. A Few Definitions
• Taxonomy is the practice and science of classification of things or concepts,
including the principles that underlie such classification. Wikipedia 2018
• An ontology is a specification of a conceptualization ̶ Gruber, 1993
• An ontology is a formal explicit description of concepts in a domain of discourse
(classes (sometimes called concepts)), properties of each concept describing
various features and attributes of the concept (slots (sometimes called roles or
properties)), and restrictions on slots (facets (sometimes called role restrictions)). ̶
Noy and McGuinness, 2001
• Knowledge engineering is the application of logic and ontology to the task of
building computable models of some domain for some purpose. ̶ Sowa, 1999
• Artificial Intelligence can be viewed as the study of intelligent behavior achieved
through computational means. Knowledge Representation then is the part of AI
that is concerned with how an agent uses what it knows in deciding what to do. –
Brachman and Levesque, 2004
• Knowledge representation means that knowledge is formalized in a symbolic form,
that is, to find a symbolic expression that can be interpreted. – Klein and Methlie,
1995 borrowed from Kendall and McGuinness -- Ontology Engineering 2018 5
6. What is an Ontology?
An ontology specifies a rich description of the
• Terminology, concepts, nomenclature
• Relationships among concepts and individuals
• Sentences distinguishing concepts, refining
definitions & relationships
relevant to a particular domain or area of interest.
* Based on AAAI ‘99 Ontologies Panel ̶ McGuinness, Welty, Uschold, Gruninger, Lehmann
McGuinness 6/7/2017
• "Pull" for Ontologies. Invited
talk. Semantics for the Web.
Dagstuhl, Germany, 2000.
• Ontologies Come of Age.
Fensel, Hendler, Lieberman,
Wahlster, eds. Spinning the
Semantic Web: Bringing the
World Wide Web to Its Full
Potential. MIT Press, 2003.
McGuinness 11/18
9. Data Life Cycle
Consistent
terminology
and
meaning
Ontology-enhanced
Search and
organization
Data management Image: J.Crabtree with permission NIEHS 50 yr FEST
Ontology-enabled
interpretation & integrationOntology-enabled
integrity checking
Provenance
annotations for trust
and reuse
Computer understandable specifications of meaning
(semantics) support enhanced lifespan & impact of data
McGuinness
10. Ontology Development Process
Use Cases
Existing Ontologies
& Vocabularies
Expert Interviews
Labkey,
Ontology
Fragments
Ontology
Curation
(ongoing)
Reviewers & Curators
* Ontology Development Team
* Domain collaborators
* Invited experts
* "Consumers" (data analysts)
Knowledge Graph
Integration
* Linking data and
metadata content to
domain terms
* Linking workflows
based on semantic
descriptions
Repository
Integration
* Source Datasets
* Analytics source
code
* Results
* Publications
Knowledge-
Enhanced
Search
Finding what
is there that
might be of
use
Semantic
Extract
Transform,
Load
(SETLr)
Expert
Guidance
Sources
Data Reporting
Templates
Data Dictionaries /
Codebooks
Foundational
Ontologies/Vocabularies
Human Aware
Data Acquisition
Framework
Ontology
Browser
Generated
Ontology
* domain concepts
* authoritative
vocabularies
* vetted definitions
* supporting citations
Erickson, McGuinness, McCusker, Chastain, Pinheiro, Rashid, Liang, Liu, Stingone, …
Exemplified by CHEAR
McGuinness 11/18
11. CHEAR Ontology Effort
11
Goal: Encode terminology currently needed by the CHEAR Data Center
Portal, publish an open source extensible ontology integrating general
exposure science and health leveraging best in class terminologies.
Enabling Findable, Accessible, Interoperable, Reusable Data and
Services to support data analysis and interdisciplinary research
Ontologies encode terms and their interrelationships, providing a foundation
for understanding interoperability and reusability (I and R in terms of FAIR)
Ontology-enabled infrastructures - Knowledge Graphs and Ontology-
enabled search services also provide support for finding and accessing
relevant content (the F and A in FAIR)
Child Health Exposure
Analysis Resource Ontology
12. Ontology Foundations
12
Use Cases help scope and
prioritize
Key Components
• Summary
• Usage Scenario
• Flow of Events
• Activity Diagram
• Competency Questions
• Resources
• See examples at
Ontology Engineering
https://docs.google.com/document/d/1A2w-
xoN5aRwlSoCTEtDsWjs2caYDRD5bANif6icDS6k/edit?usp=sharing
12
Starting with the Use Case
McGuinness 11/18
15. Mapping Data to Meaning:
Semantic Data Dictionaries
Rashid, Chastain, Stingone, McGuinness, McCusker. The Semantic Data Dictionary Approach to
Data Annotation and Integration. Enabling Open Semantic Science, in Proc of the International Web
Science Conference Knowledge Graphs Workshop Oct 21, 2017
McGuinness 11/18 Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
16. 16
• Ontology support for
mapping and integration
(e.g., education level)
• Ontology informs decisions
about variables that may be
combined, serve as proxy,
or used to derive desired
info (e.g., birth outcomes)
• Ontology Integrity
constraints may help flag
errors (e.g., APGAR > 10)
• Ontology helps expose
implicit information and find
links
Fenton Z-Score
Sex
Birth
weight
Gest
Age
Mother’s Highest
Education Level
Val
Did not attend school 0
Elementary school 1
Technical post-primary 2
Middle school 3
Technical post-middle
school 4
Highschool or junior
college 5
Technical post-junior
college 6
College 7
Graduate 8
Doesn’t know 9
Mother
Education
Val
Less than High
School 0
High School
Graduate or More 1
Support Browsing,Searching, Pooling
Deriving Values, Verification, …
McGuinness, McCusker, Pinheiro, Stingone, et. al. Funding: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
McGuinness 11/18
17. Automatic ingest,
access control, data
governance,
precision download,
…
Supports Search study,
data sample, subject, ...
Enables smart
queries e.g., find
Child:BirthWt, Gender,
Gestational Age at Birth
Mother:Age, BMI “early
in pregnancy based on
inclusion criterion for
the particular study”,
Parity, Education
Metals: As, CD, Mn,
Mo, Pb
Ontology-Enabled Human Aware
Data Acquisition Framework
McGuinness 11/18
Pinheiro, Liang, Rashid, Liu, Chastain, Santos, McCusker, McGuinness
Sample Question: Gennings
Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
18. Ontology-Enabled Study Search /
Precision Data Search and Download
Blood Biomarkers for Children’s Health (Study 1)
Institution:
Principal Investigator(s):
Number of Subjects:
Number of Samples:
Study Description:
Keywords:
Urine Biomarkers for Children’s Health (Study 2)
Institution:
Principal Investigator(s):
Number of Subjects:
Number of Samples:
Study Description:
Keywords:
Metabolomic Biomarkers for Children’s Health (Study 3)
Institution:
Principal Investigator(s):
Number of Subjects:
Number of Samples:
Study Description:
Keywords:
McGuinness 11/18 Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
19. Ontology Evolution Strategy
Identify terms
that can be
mapped to
existing ontology
Identify terms
to be added
to ontology
Describe new
terms w/
definitions and
location within
existing ontology
Mappings (e.g.
variable names)
incorporated into
knowledge graph
Data into
knowledge graph
after embargo
period
Incorporate new
terms into
existing ontology
Review and
revise updates
with stakeholders
Data Structures
& Standards
Working Group
Compile new
terms across
multiple studies
(e.g. Quarterly)
Data Center
New
version
Ontology
X
McGuinness Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
20. Vocabulary TakeAways
McGuinness Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
• Large Community Effort
• Vocabulary Designed initially by Semantics
Experts
• Processes co-designed by domain experts and
semantics experts
• Vocabulary evolution managed by the
community
• Leverages MANY community vocabularies
• Supports a wide range of services that we claim
would not be supported without interlinked,
human supervised vocabulary
21. Ontology-enabled Framework:
AI Horizons Example
Ontologies are an important piece;
but are part of a larger integrated
framework
Semanalytics / SemNExt RPI team:
McGuinness, Bennett PIs with
McCusker, Erickson, Seneviratne and the extended Research
groups along with input from IBM HEALS collaborators and
motivation from MANY projects, particularly from NIEHS/
Mount Sinai,
McGuinness, D.L. & Bennett K. Integrating
Semantics and Numerics: Case Study on
Enhancing Genomic and Disease Data
Using Linked Data Technologies. In Proc.
Smart Data 2015, San Jose, CA.McGuinness 11/18 partially supported through IBM AI Horizons Network
22. Probability-Aware
Knowledge Exploration
• Knowledge imported from
drug, protein, and
disease interaction
databases.
• Each interaction given an
evidence-driven
probability.
• Find drugs that could
affect melanoma, filtered
by interaction probability.
• The best hypotheses
were generated using the
highest probabilities.
• Being expanded initially
for genomics-aware
cancer care applications
McCusker, Dumontier, Yan,
He, Dordick, McGuinness.
Finding Melanoma Drugs
through a Probabilistic
Knowledge Graph. PeerJ
4L 32007, 2016.
McGuinness 11/18
23. 23
Text Mining
Linked Data
Biomedical Databases
Omics and Epidemiology Datasets
File uploads of any kind
Fully automated
Reproducible and traceable from diverse
data types:
Tabular CSV & XLS, XML, JSON, HTML,
etc.
Transformed into rigorously modeled
knowledge:
Meaning as structure, global IDs,
foundational ontologies
Tracked and versioned using provenance
standards
RPI knowledge graph technology
curates from diverse sources
McGuinness 11/18 partially supported through IBM AI Horizons Network
24. Guidelin
es
Guideline Decision Support
Patient Data
Comorbidities
Previous Treatment
Plans
Contraindications
Authoritative Guidelines
Recommendation
Cited Research Studies
Population
Descriptions
Inclusion /
Exclusion Criteria
EH
R
Dat
a
G-Prov
Conceptualize
Ontologies
Output
Treatment Plan
Cohort Similarity
Score
Provenance
justified options
Evidence
justified
options
Reasoning
Agents
Decision
Support
Gap
Analysis
Write to
Knowledge
Graph
Study
Cohort
Ontology
Application: Results
Visualization
Patient_
Profile(
pp)
h
a
s
_
t
r
e
a
t
m
e
n
t
_
p
l
a
n
(
p
p
,
t
p
)
h
a
s
_
t
a
r
g
e
t
_
a
c
h
i
e
v
e
d
(
t
,
f
a
l
s
e
)
Treat
ment
_plan
(tp)
Tar
get(t
)
h
a
s
_
d
i
a
g
n
o
s
i
s
(
p
p
,
d
m
)
drugS
ubpla
n(dsp
)
'Duration
description'(
dd)
T
i
m
e
I
n
s
t
a
n
t
(
i
2
)
h
a
s
_
E
n
d
(
I
,
i
2
)
h
a
s
_
d
u
r
a
t
i
o
n
d
e
s
c
r
i
p
t
i
o
n
(
I
,
d
d
)
ProperInterv
al(i)
h
a
s
_
t
a
r
g
e
t
(
t
p
,
t
)
m
o
n
t
h
s
(
d
d
,
3
)
h
a
s
_
d
u
r
a
t
i
o
n
(
t
a
r
,
i
)
'diabetes
mellitus'(d
m)
h
a
s
_
p
a
r
t
(
t
p
,
d
s
p
)
h
a
s
_
p
a
s
s
e
d
_
3
_
m
o
n
t
h
s
(
?
t
p
,
t
r
u
e
)
h
a
s
_
A
1
C
_
l
e
v
e
l
(
t
,
‘
6
.
5
’
)
ha
s_
dr
ug
_s
ub
pl
an
_l
ev
el
(d
su
p,
"
m
on
ot
he
ra
py
"),
h
a
s
_
p
r
e
v
i
o
u
s
_
t
r
e
a
t
m
e
n
t
_
p
l
a
n
(
p
,
t
r
u
e
)
h
a
s
_
s
u
g
a
r
_
l
e
v
e
l
(
t
,
‘
6
.
5
’
)
T
i
m
e
I
n
s
t
a
n
t
(
i
1
)
h
a
s
_
d
r
u
g
_
s
u
b
p
l
a
n
_
l
e
v
e
l
(
d
s
u
p
,
“
d
u
a
l
t
h
e
r
a
p
y
"
)
,
O
r
a
l
(
o
r
)
T
a
b
l
e
t
(
t
a
b
)
h
a
s
_
d
r
u
g
_
p
a
r
t
i
c
i
p
a
n
t
(
d
s
p
,
m
)
h
a
s
_
d
o
s
e
F
o
r
m
(
m
,
t
a
b
)
h
a
s
_
r
o
u
t
e
_
o
f
_
a
d
m
i
n
i
s
t
r
a
t
i
o
n
(
m
,
o
r
)
h
a
s
_
p
r
e
v
i
o
u
s
_
t
r
e
a
t
m
e
n
t
_
p
l
a
n
(
p
,
t
r
u
e
)
Drug(m)
Diabetes
Mellitus
Treatment
Ontology
26. • Finding Events
• What to Make
• Guideline-based Diabetes Mgmt
• Machine Learning for Facial
Recognition
All heavily rely on existing
vocabularies: finding them, stitching
them together, expanding as needed
Ontology Engineering Examples
McGuinness 11/18
https://tw.rpi.edu/web/Courses/Ontologies/2018
27. Discussion
• Taxonomies/Ontologies enable FAIR
(Findable, Accessible, Interoperable,
Reusable) Data Resources
• They can support movement across levels of
abstraction
• Use ontology-enabled architectures
• Do NOT build taxonomies/ ontologies from
scratch
• Selectively and thoughtfully reuse existing
best practice ontologies/vocabularies
• Engage experts in choosing ontology
(portions) and in designing the knowledge
architecture
• Ecosystems and diverse teams are critical for
success – community driven and maintained
ontology-based systems are the future
• Lets enable the next generation of decision
support together! Please contact us for
collaboration
McGuinness 7/18
28. Questions
• Taxonomies / Ontologies
enable interoperability and
actionable applications
• Terminologies need us and
the world needs well defined
terminologies
• Contact: dlm@cs.rpi.edu
• Thanks to many: RPI Tetherless World team particularly
McCusker, Erickson, Hendler, Pinheiro, Rashid, Liang, Liu,
Chastain, Chari; RPI: Bennett, Dyson, Seneviratne; Mount
Sinai particularly Teitelbaum, Stingone, Mervish, Gennings,
Kovatch; IBM, particularly Das, Chen, Brown, ….
• Funding: NIEHS 0255-0236-4609 / 1U2CES026555-01,
DARPA HR0011-16-2-0030, IBM-RPI HEALS AI Horizons
Network, NSF ACI-1640840
• Forthcoming book: Ontology Engineering with Kendall
McGuinness 11/18