Best practices and platforms for access and reuse of scientific data and models. We explore a Linked Data approach for data integration, modeling and interoperability.
Delivered by Bernadette Hyland at EPA & Society of Toxicology Scientific Workshop titled: "Building for Better Decisions: Multi-scale Integration of Human Health and Environmental Data..
Delivered 8-May-2012 at EPA Research Triangle Park, NC USA.
Linked Data Approach for Integration of Human Health & Environmental Data
1. Linked Data Approach for
Integration of Human Health and
Environmental Data
Building for Better Decisions: Multi-scale Integration of Human Health
and Environmental Data
8-11 May 2012
By: Bernadette Hyland,
Chair, W3C Government Linked Data WG
CEO, 3 Round Stones, Inc
Email. bhyland@3roundstones.com
Twitter: @BernHyland
This presentation: http://slideshare.net/3roundstones
Tuesday, May 8, 12 1
2. • Linked Data is
about publishing
and consuming
data using
international data
standards
• Based on 20 year
old idea
• A system of linked
information systems
Tuesday, May 8, 12 2
5. A HISTORY OF SILOS
$ cat foo.txt
| grep blah |
sort
1970s 1980s 1990s
A neat little package Client-Server The Early Web
Tuesday, May 8, 12 5
6. There is a better way to connect
data ...
• No one vendor owns it
• It scales ... to Web-scale
• Doesn’t require a super model
• Based on International Data Exchange
Standards (RDF, SPARQL)
Tuesday, May 8, 12 6
7. What is next for Data in the
Web?
• What is next for Open Data on the Web
• Structured data on the Web is quickly
becoming mainstream
• Authorities beginning to appreciate a new way
to publish and consume content
Tuesday, May 8, 12 7
16. Linked Data in Context
Universal Client Ubiquitous,
reusable applications
URL Curation
Universal Connection Logic and interlinking
Web
of Data
Universal Database
Tuesday, May 8, 12 16
20. Why is RDF important?
• It is an international standard for publishing data on
the Web (public and private)
• Data exchange model
• Serializations include RDF/XML, N-triples, N3,
Turtle ...
• It is the future of using the Web
Tuesday, May 8, 12 20
21. What you can do ...
• Good = Use Data Standards (RDF) to publish
metadata about data and models, at a minimum
• Better = Use RDF to publish all your data
• Best = Link your data + models
• Web architecture, Web-scale
Tuesday, May 8, 12 21
34. CDC Open
Government
Linked
Data
EPA Data Cloud DBpedia
US
Census Pub
Med
Clinical
Ontology NLM
Business
Ontology
Social
Media Internal
Portal
Data
Facebook Physicians
TwiCer Services
EMR Loca*ons
Data
Clinical
Condi*on
Specific
Tuesday, May 8, 12 34
35. Value Proposition
• Decrease costly emergency department visits
• Reduce hospital re-admissions after treatment
• Improved self-care and medication compliance
• Education of triggers and disease management
Tuesday, May 8, 12 35
36. Func*onal
Model
1.
Define
target
popula*on
and
clinical
data
from
electronic
medical
record
2.
Iden*fy
sources
of
open
government
data
related
to
environmental,
weather,
and
other
variables
related
to
chronic
pulmonary
disease
exacerba*ons
3.
Combine
open
content
from
NLM,
PubMed,
Medline
to
support
educa*on
4.
Leverage
a
Linked
Data
approach,
using
Open
Source
and
interna*onal
data
exchange
standards
(RDF)
5.
Alert
pa*ent
of
possible
hazardous
condi*ons
and
recommend
appropriate
ac*ons
Tuesday, May 8, 12 36
37. Leverage
Linked
Data,
Open
Source
&
Standards
Web
of
Data SMS
CDC DBpedia
EPA Pub
Med
US
Census NLM Email
CA-‐email-‐message.jpg
Web
EMR
Tuesday, May 8, 12 37
39. Shows:
1) Air Quality data from US EPA
2) Anonymized EMR data
3) Doctor’s details from CSV file
Uses Callimachus,
a Linked Data Management
Platform
Tuesday, May 8, 12 39
40. Tools & best practices?
• Large and small vendors are involved in Linked Data
• From Oracle, IBM to 3 Round Stones
• Listing of active projects, companies and research See
http://dir.w3.org/
• Best practices, see http://www.w3.org/2011/gld/charter
Tuesday, May 8, 12 40
41. • Callimachus is a framework for data-driven applications
based on Linked Data principles
• Callimachus allows Web developers to easily create data
driven applications for the Web
• It is Open Source (FLOSS)
• http://callimachusproject.org
Tuesday, May 8, 12 41
43. DELIVERABLES
Community Directory
Best Practices for Publishing Linked Data
Procurement, vocabulary selection, URI construction,
versioning, stability, legacy data issues
Cookbook for Linked Open Data
Standard Vocabularies
Metadata, Statistical “Cube” Data, People,
Organizational structures
Tuesday, May 8, 12 43
45. Recommendations
• Be prepared for the scientific community & public to demand that your data be
published in re-usable format (RDF)
• Demand your vendors use Open Source whenever possible
• Incentivize industry & STM publishers to do the right thing
• Open vs. proprietary technologies & data formats ... be OPEN
• Beware of semantic “pixie dust” - be “an educated consumer” (and scientist!)
• Solutions must embrace International Standards and published Best Practices (W3C,
OMG, IETF)
• Define a URI Policy and Strategy, document it and ensure scientists use it!
• Leverage the work of others and work cooperatively...
• Our future is all connected through your work...
Tuesday, May 8, 12 45