The National Institute of Environmental Health Sciences (NIEHS) supported a Children's Health Exposure Analysis Repository(CHEAR) program that needed to integrate data across exposure science and health. We led the data science effort of this program and design the CHEAR ontology to support data integration and to leverage a wide range of existing ontologies and vocabularies. We are refactoring the ontology to support human health (instead of just aiming to support child health, and broadening support a broad range of environmental health sciences applications.
Towards an Environmental Health Sciences Ontology:CHEAR to HHEAR and Beyond
1. Towards an Environmental
Health Sciences Ontology:
CHEAR to HHEAR and Beyond
Deborah L. McGuinness
Tetherless World Senior Constellation Chair
Professor of Computer, Cognitive, and Web Science
Director RPI Web Science Research Center
RPI Institute for Data Exploration and Application Health Informatics Lead
dlm@cs.rpi.edu , @dlmcguinness
Computable Exposures Workshop September 9, 2019
2. CHEAR Ontology Effort
2
The Children’s Health Exposure Analysis Resource, or CHEAR, is a program funded
by the National Institute of Environmental Health Sciences to advance understanding
about how the environment impacts children’s health and development over the
course of a lifetime.
https://chearprogram.org/
Children’s Health Exposure
Analysis Resource (CHEAR)
McGuinness 9/9/19 Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
CHEAR is composed of three components:
A National Exposure Assessment Laboratory Network, providing both targeted
and untargeted environmental exposure and biological response analyses in
human samples
A Data Repository, Analysis, and Science Center, providing statistical services,
a data repository, and data standards for integration and sharing
A Coordinating Center, connecting the research community to CHEAR
resources
3. CHEAR Ontology Effort
3
The NIEHS is establishing an infrastructure, the Human Health Exposure Analysis
Resource (HHEAR) as a continuation of the Children's Health Exposure Analysis
Resource (CHEAR). The goal of this consortium is to provide the research
community access to laboratory and statistical analyses to add or expand the
inclusion of environmental exposures in their research and to make that data publicly
available as a means to improve our knowledge of the comprehensive effects of
environmental exposures on human health throughout the life course.
Human Health Exposure Analysis Resource (HHEAR): Data Repository, Analysis and
Science Center
https://grants.nih.gov/grants/guide/rfa-files/rfa-es-18-014.html
Human Health Exposure
Analysis Resource (HHEAR)
McGuinness 9/9/19 Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
4. CHEAR Ontology Effort
4
Goal: Encode terminology currently needed by the CHEAR Data Center
Portal, publish an open source extensible ontology integrating general
exposure science and health leveraging best in class terminologies.
Enabling Findable, Accessible, Interoperable, Reusable Data and
Services to support data analysis and interdisciplinary research
Ontologies encode terms and their interrelationships, providing a foundation
for understanding interoperability and reusability (I and R in terms of FAIR)
Ontology-enabled infrastructures - Knowledge Graphs and Ontology-
enabled search services also provide support for finding and accessing
relevant content (the F and A in FAIR)
Child Health Exposure
Analysis Resource Ontology
McGuinness 9/9/19 Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
Stingone, Mervish, Kovatch, McGuinness, Gennings, Teitelbaum. Big and Disparate Data: Considerations for
Pediatric Consortia. Current Opinions in Pediatrics Journal. 29(2):231-239, April 2017. doi:
10.1097/MOP.0000000000000467. PMID: 28134706
5. Ontology Development Process
Use Cases
Existing Ontologies
& Vocabularies
Expert Interviews
Labkey,
Ontology
Fragments
Ontology
Curation
(ongoing)
Reviewers & Curators
* Ontology Development Team
* Domain collaborators
* Invited experts
"Consumers" (data analysts)
* Semi-automated critiques
Knowledge Graph
Integration
* Linking data and
metadata content to
domain terms
* Linking workflows
based on semantic
descriptions
Repository
Integration
* Source Datasets
* Analytics source
code
* Results
* Publications
Knowledge-
Enhanced
Search
Finding what
is there that
might be of
use
Semantic
Extract
Transform,
Load
(SETLr)
Expert
Guidance
Sources
Data Reporting
Templates
Data Dictionaries /
Codebooks
Foundational
Ontologies/Vocabularies
Interactive
Ontology
Browser with
Annotation
Generated
Ontology
* domain concepts
* authoritative
vocabularies
* vetted definitions
* supporting citations
Erickson, McGuinness, McCusker, Chastain, Pinheiro, Rashid, Liang, Liu, Stingone, …
Exemplified by CHEAR
McGuinness 9/19 https://chearprogram.org/
Extracted
Vocabularies, …
6. Ontology Foundations
6
Use Cases help scope and
prioritize
Key Components
• Summary
• Usage Scenario
• Flow of Events
• Activity Diagram
• Competency Questions
• Resources
• See examples at
Ontology Engineering
https://docs.google.com/document/d/1A2w-
xoN5aRwlSoCTEtDsWjs2caYDRD5bANif6icDS6k/edit?usp=sharing
6
Starting with the Use Case
McGuinness 9/19
7. Ontology Foundations
7
Imported Ontologies:
● Semantic Science Integrated
Ontology (SIO)
● PROV-O
● Units Ontology
● Human-Aware Science Ontology
(HAScO)
● Virtual Solar Terrestrial
Observatory (Instruments)
(VSTO-I)
● Environment Ontology (ENVO)
● …
Minimum Information to Reference an
External Ontology Term (MIREOT)-ed
Ontologies:
● Chemicals of Biological Interest (CheBI)
● Statistics Ontology (STAT-O)
● PubChem
● UBERON (Anatomy)
● Disease Ontology (DO)
● UniProt (Proteins)
● Cogat (Cognitive Measures)
● ExO
● RefMet, …
Annotations:
● Simple Knowledge Organization System
(SKOS)
● Dublin Core (DC) Terms
7
CHEAR Ontology
Foundations and Reuse
McGuinness 9/19 Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
McCusker, Rashid, Liang, Liu, Chastain,
Pinheiro, Stingone, McGuinness. Broad,
Interdisciplinary Science In Tela: An Exposure
and Child Health Ontology. In Proceedings of
Web Science, 2017. Troy, NY. 349-357.
8. Mapping Data to Meaning:
Semantic Data Dictionaries
Rashid, Chastain, Stingone, McGuinness, McCusker. The Semantic Data Dictionary Approach to
Data Annotation and Integration. Enabling Open Semantic Science, in Proc of the International Web
Science Conference Knowledge Graphs Workshop Oct 21, 2017
McGuinness 9/19 Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
9. Content Pipeline
9
Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01 and IBM AI Horizons Network
Increasing levels of automation
Reproducible, traceable
transformations from diverse sources
Transformation to rigorously modeled
knowledge
Versioning supported by provenance
10. 10
• Ontology support for
mapping and integration
(e.g., education level)
• Ontology informs decisions
about variables that may be
combined, serve as proxy,
or used to derive desired
info (e.g., birth outcomes)
• Ontology Integrity
constraints may help flag
errors (e.g., APGAR > 10)
• Ontology helps expose
implicit information and find
links
Fenton Z-Score
Sex
Birth
weight
Gest
Age
Mother’s Highest
Education Level
Val
Did not attend school 0
Elementary school 1
Technical post-primary 2
Middle school 3
Technical post-middle
school 4
Highschool or junior
college 5
Technical post-junior
college 6
College 7
Graduate 8
Doesn’t know 9
Mother
Education
Val
Less than High
School 0
High School
Graduate or More 1
Support Browsing,Searching, Pooling
Deriving Values, Verification, …
McGuinness, McCusker, Pinheiro, Stingone, et. al. Funding: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
McGuinness 9/19
13. Automatic ingest,
access control, data
governance,
precision download,
…
Supports Search study,
data sample, subject, ...
Enables smart
queries e.g., find
Child:BirthWt, Gender,
Gestational Age at Birth
Mother:Age, BMI “early
in pregnancy based on
inclusion criterion for
the particular study”,
Parity, Education
Metals: As, CD, Mn,
Mo, Pb
Ontology-Enabled CHEAR Human
Aware Data Acquisition Framework
McGuinness 12/18
Pinheiro, Santos, Liang, Liu, Rashid, McGuinness, Bax. HADatAc: A Framework for Scientific
Data Integration using Ontologies. Intl Semantic Web Conference, Monterey, CA, 2018.
Sample Question: Gennings
Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
14. Ontology-Enabled Study Search /
Precision Data Search and Download
Blood Biomarkers for Children’s Health (Study 1)
Institution:
Principal Investigator(s):
Number of Subjects:
Number of Samples:
Study Description:
Keywords:
Urine Biomarkers for Children’s Health (Study 2)
Institution:
Principal Investigator(s):
Number of Subjects:
Number of Samples:
Study Description:
Keywords:
Metabolomic Biomarkers for Children’s Health (Study 3)
Institution:
Principal Investigator(s):
Number of Subjects:
Number of Samples:
Study Description:
Keywords:
McGuinness, Pinheiro et al, 9/19 Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
15. Ontology Evolution Strategy
Identify terms
that can be
mapped to
existing ontology
Identify terms
to be added
to ontology
Describe new
terms w/
definitions and
location within
existing ontology
Mappings (e.g.
variable names)
incorporated into
knowledge graph
Data into
knowledge graph
after embargo
period
Incorporate new
terms into
existing ontology
Review and
revise updates
with stakeholders
Data Structures
& Standards
Working Group
Compile new
terms across
multiple studies
(e.g. Quarterly)
Data Center
New
version
Ontology
X
McGuinness with Stingone Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
16. Evolution Plans
McGuinness Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
• HHEAR to start soon (Sept 1)
• HHEAR needs to be backwards compatible with
CHEAR
• Much of CHEAR is not CHEAR or HHEAR
specific and in fact is reused by other health
informatics efforts
• Plans to extract a “reusable core” aimed at
environmental health sciences needs
17. Status
McGuinness et al, 9/9/19 Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
• CHEAR available:
• http://www.hadatac.org/chear-ontology/
• http://www.hadatac.org/ont/chear/
• https://bioportal.bioontology.org/ontologies/CH
EAR
• One more release expected (before HHEAR)
• CHEAR namespace will work however we will
encourage using the HHEAR namespace
• Refactoring coming
18. Some TakeAways
McGuinness Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
• Large Community Effort
• Vocabulary Designed initially by Semantics
Experts
• Processes co-designed by domain experts and
semantics experts
• Vocabulary evolution managed by the community
• Leverages MANY community vocabularies
• Has evolving prioritization AND criteria for choice
• Supports a wide range of services that we claim
would not be supported without interlinked,
human supervised vocabulary
19. Community Conversation
McGuinness 9/19
• We would like users , collaborators, requirements
providers, etc.
• We aim to follow the principles in the Ontology
Engineering book
• Questions? : dlm@cs.rpi.edu
Thanks to many: RPI Tetherless World team particularly
McCusker, Erickson, Hendler, Pinheiro, Rashid, Liang, Liu,
Chastain, Chari; RPI: Bennett, Dyson, Seneviratne; Mount Sinai
particularly Teitelbaum, Stingone, Mervish, Gennings, Kovatch;
IBM, particularly Das, Chen, Brown, ….
Funding: NIEHS 0255-0236-4609 / 1U2CES026555-01, DARPA
HR0011-16-2-0030, IBM-RPI HEALS AI Horizons Network, NSF
ACI-1640840, National Spectral Consortium, …
21. Discussion
• Taxonomies/Ontologies enable FAIR (Findable,
Accessible, Interoperable, Reusable) Data
Resources
• Use ontology-enabled architectures
• Do NOT build taxonomies/ ontologies from scratch
• Selectively and thoughtfully reuse existing best
practice ontologies/vocabularies
• Leverage others mappings and selection criteria
where possible
• Engage experts in choosing ontology portions and
in designing the knowledge architecture
• Ecosystems and diverse teams are critical for
success – community driven and maintained
ontology-based systems are the future
• Flexible, Provenance, and Certainty-Aware
Knowledge Graphs are also the future
DLM@CS.RPI
.EDU
McGuinness
22. What is an Ontology?
An ontology specifies a rich description of the
• Terminology, concepts, nomenclature
• Relationships among concepts and individuals
• Sentences distinguishing concepts, refining
definitions & relationships
relevant to a particular domain or area of interest.
* Based on AAAI ‘99 Ontologies Panel ̶ McGuinness, Welty, Uschold, Gruninger, Lehmann
McGuinness 6/7/2017
• "Pull" for Ontologies. Invited
talk. Semantics for the Web.
Dagstuhl, Germany, 2000.
• Ontologies Come of Age.
Fensel, Hendler, Lieberman,
Wahlster, eds. Spinning the
Semantic Web: Bringing the
World Wide Web to Its Full
Potential. MIT Press, 2003.
McGuinness 12/18
23. 23
Text Mining
Linked Data
Biomedical Databases
Omics and Epidemiology Datasets
File uploads of any kind
Fully automated
Reproducible and traceable from diverse
data types:
Tabular CSV & XLS, XML, JSON, HTML,
etc.
Transformed into rigorously modeled
knowledge:
Meaning as structure, global IDs,
foundational ontologies
Tracked and versioned using provenance
standards
RPI knowledge graph technology
curates from diverse sources
McGuinness 12/18 with McCusker et al. partially supported through IBM AI Horizons Network
24. Whyis Knowledge Graph
Framework Current Usage
McCusker, McGuinness 12/18
– HEALS Project: clinical guidelines, cancer restaging, etc.
– Nanomaterials “genome” – NanoMine to MetaMine
– Radio Spectrum Policies
– Biology Knowledge Graph
– Knowledge Graph Catalog
– …. Your use here