The document discusses linking clinical data standards to the Semantic Web. It begins by explaining the difference between the traditional web of documents and the emerging web of linked data. It then provides examples of linked open government data from the UK and US. The presentation considers opportunities for applying linked data principles to linking clinical study metadata and data across the industry. Pragmatic first steps discussed include learning from other projects, expressing CDISC standards as linked data using URIs, and publishing trial summary parameters as RDF.
1. Linking Clinical Data Standards
Presented by Kerstin Forsberg
AstraZeneca R&D, Clinical Information Strategy
kerstin.l.forsberg@astrazeneca.com
kerfors on Twitter, LinkedIn, SlideShare, Blogspot, citulike
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 1
2. Things I want to show you
• Beyond a Web of Documents
Web of Data
• Forerunners
The UK and US Government
• Two things to remember
Triples and Global Identifiers
• Three live examples
From the Linking Open Data cloud
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 2
3. Things for CDISC, and for us in
the industry, to consider
• How this relates to the Innovative Medicine
Initiative (IMI) projects
• Pragmatic first steps for CDISC, together with NCI
Linking Clinical Data Standards
• Opportunities across the industry
Linking Clinical Study Metadata
Linking Clinical Data
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 3
4. Acknowledgements
Bosse Andersson, for sharing insights from AstraZeneca’s engagement in
Semantic Web Health Care and EU project The Large Knowledge Collider
Life Sciences (HCLS) Interest Group Linked Life Data
Linking Open Drug Data
Wayne Kubick, CDISC and Oracle, for good email discussions last summer
Martin Agfjord, Gothenburg University, for the Bachelor Thesis work
Chimezie Ogbuji, Case Western Reserve University's Center for Clinical
Investigation and previously Cleveland Clinics, for the explorative work using
the Patient Controlled Health Records (PCHR) Ontology for Clinical Data
Sam Hume, Simon Lundberg, Dan Ringenbach, Lee Evans,
Gunnar Magnusson for being ”healthy volunteers”
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 4
5. Web of Documents
Web 3.0
Image Source: Frederic Martin
Web of Data
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 5
6. Forerunners:
The UK and US Government
data.gov online “Open Government data.gov relaunch
January 1, 2009
December 8, 2009
“Openness will strengthen
May 21, 2010
May 21, 2009
our democracy and promote Directive” released 6.4 billion RDF triples
efficiency and effectiveness
in Government.”
--- President Obama
January 19, 2010
June30,2009
Putting
Government
Data online data.gov.uk online
Illustration by Prof .Jim Hendler
The Semantic Web 2010 Status Update
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 6
7. Linking Open Data cloud
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 7
8. What’s a Triple?
subject predicat object
An example from a text book
The sky has the color blue
Three example of facts from the Open Linked Data cloud
Payment number 8605670 netAmount 120.00
Clinical Trial number NCT00755378 enrollment 58
The Brussels Capital Region populationTotal 1080790
Three example representing some of the standards for these three facts
“The net amount of the payment. This is the effective cost to
The property netAmount comment the payer after any reclaimable tax has been deducted.”
The type of entity Active Ingredient subClassOf Type of entity Chemical Substance
The data property populationTotal domain Type of object Populated Place
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 8
9. And, what’s RDF? And, how is XML
related to this?
Common Model for Data
subject predicat object
Resource Description Framework
Alternative serialization formats
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 9
10. What’s a Global Identifier?
Linked Data Principles
1. Use URIs (Uniform Resource Identifiers)
to identify things.
.
2. Use HTTP URIs so that these things can be
referred to and looked up ("dereferenced") by
people and “machines”.
Three examples of identifiers from the Open Linked Data cloud
http://spending.lichfielddc.gov.uk/spend/8605670
http://data.linkedct.org/resource/trial/NCT00755378
http://dbpedia.org/resource/Brussels
Three examples of identifiers used in the standards behind the three examples
http://reference.data.gov.uk/def/payment#netAmount
http://www.w3.org/2001/sw/hcls/ns/transmed/TMO_0000
http://dbpedia.org/ontology/populationTotal
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 10
11. And, what’s the difference between
URI and URL?
Brussels Capital Region
is a “real world” entity Identified by this URI
http://dbpedia.org/resource/Brussels
Locator (URL) of the “people friendly” view of the
data about the Brussels Capital Region
http://dbpedia.org/page/Brussels
Linked Data Principles Locator (URL) of the “machine-processeable”
data about the Brussels Capital Region
3. Provide useful, structured, http://dbpedia.org/data/Brussels.rdf
information about the thing
when its URI is
lock-up:ed (de-referenced).
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 11
12. So, what’s the fourth Linked Data
principle?
http://data.nytimes.com/N78748399240553400231
owl:sameAs
http://dbpedia.org/resource/Brussels
owl:sameAs
http://sws.geonames.org/2800866/
Linked Data Principles
4. Include links to other, related
URIs in the exposed data to
improve discovery of other
related information.
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 12
13. Semantic Web standards is a key topic
across the IMI projects
Innovative Medicine Initiative
EHR4CR DDMoRE OpenPHACTS
Electronic Health Records Drug Disease Model The Open Pharmacological
for Clinical Research Resources Concepts Triple Store
RICORDO
Researching Interoperability using
Core Reference Datasets and
Ontologies for the Virtual
Physiological Human
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 13
14. Pragmatic first steps for CDISC,
together with NCI
• Learn from others, such as the UK and US
Government
• Apply the Linked Data principles
Start with the SDTM CT:s e.g. the Trial Summary
Parameters
• Strive for a 5-star rating of Linked Open Data
Pragmatic first steps, more details
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 14
15. Opportunities across the industry
• Applying the four Linked Data principles for
Linking Clinical Study Metadata
Linking Clinical Subject Data
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 15
16. A scenario:
Linking Clinical Study Metadata
Internal
categorization to
support Design &
Interpretation
decisions
http://clinial.data.astrazeneca.com/id/study/D8180C00011
owl:sameAs
http://data.linkedct.org/resource/trials/NCT00755378
What would we like to see on a
What would we likewebpage
internal to see as the
linked data description of data
presenting linked it?
describing a clinical study?
More ideas
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 16
17. So, why I'm so enthusiastic about
all of this!
• Well applied Linked Data principles and cautious
steps building on existing insights …
• … would improve the research utility of clinical
datasets
Organized for associations
Prepared for not yet defined use
Ready for automation where computers can
function alongside us to
• Mitigate the complexity in clinical research
• Improve the productivity in clinical data
management
• …
Live examples Pragmatic first steps More ideas
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 17
19. Three Live Examples
1. Net amount for an expenditure from a local authority in UK
2. Enrollment number for a study from ClinicalTrial.gov
3. Population of Brussels Capital Region from Wikipedia
1 2 3
2 of 10 Triples 2 of 48 Triples 4 of 992 Triples
Three different approaches to standardization
Pragmatic first steps More ideas
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 19
20. Expenditure amount for a local
authority in UK
1
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 20
21. Expenditure amount for a local
authority in UK
1
Live view using the Web Data Inspector
2 of 10 Triples
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 21
22. An example of a top-down approach to
standardization for Linked Data
The Linked Data Cube Vocabulary
Live view using the Web Data Inspector
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 22
23. UK government: Top-down approach to
standardization for Spending Data
Statistical Data perspective
Linked Data Cube Vocabolary
Payment Ontology
Live example index
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 23
24. Enrollment number for a study
from ClinicalTrial.gov
2
Semantic Web Health Care
and Life Sciences (HCLS)
Interest Group
Linking Open Drug Data
(Life Science part of the
Linking Open Data cloud)
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 24
25. Enrollment number for a study
from ClinicalTrial.gov
2
Live view using the Web Data Inspector
Semantic Web Health Care
and Life Sciences (HCLS)
Interest Group
Linking Open Drug Data
(Life Science part of the
Linking Open Data cloud)
2 of 48 Triples
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 25
26. An example of first using the source data
structure as the ”standard” …
2
Live view using the Web Data Inspector
2 of 48 Triples
Database key Variabel name Table name
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 26
27. … and then as a next step look for a
common standard
Semantic Web Health Care
Live view using the Web Data Inspector and Life Sciences (HCLS)
Interest Group
Translational Medicine Ontology
(a.k.a. Pharma Ontology)
Live example index
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 27
28. Population of Brussels Capital
Region from Wikipedia
3
Live view using the Web Data Inspector
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 28
29. Linking between different sources about
the same entity using different identifiers
3
Live view using the Web Data Inspector
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 29
30. Bottom-up standardization: Community
curated, and a central shallow Ontology
http://dbpedia.org/ontology/PopulatedPlace
http://dbpedia.org/ontology/populationTotal
http://dbpedia.org/ontology/populationAsOf
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 30
31. Pragmatic first steps for CDISC,
together with NCI
• Learn from others, such as the UK and US
Government
• Apply the Linked Data principles
Start with the SDTM CT:s e.g. the Trial Summary
Parameters
• Strive for a 5-star rating of Linked Open Data
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 31
32. Linked Open Data star scheme
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 32
33. Pragmatic first steps
Learn from others, two examples
• AGFA
Clinical drug administration forms mapped to SNOMED
CT and FDA codes using the SKOS (Simple Knowledge
Organization System) vocabulary
http://www.agfa.com/w3c/2009/drugAdministrationForms#
• EU Project – LarKC
• Published CDISC CT:s as part of Linked Life Data
http://linkedlifedata.com/resource/umls/id/C1879952
http://linkedlifedata.com/resource/umls/id/C0013153
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 33
34. Pragmatic first steps, start with
Trial Summary Parameters
0 Existing text strings published in a long file in Excel txt format (soon also as ODM/XML)
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 34
35. Pragmatic first steps, start with
Trial Summary Parameters
1 Establish a CDISC URI scheme
Proposal: Build on http://data.gov.uk/resources/uris
Examples:
http://reference.data.cdisc.org/ct/sdtm/ROUTE#ORAL
or alt. using the C-code from NCI Thesaurus
http://reference.data.cdisc.org/ct/sdtm/C66729#C38288
2 Model CDISC SDTM CT:s as RDF Triples
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 35
36. Pragmatic first steps, start with
Trial Summary Parameters
1 Establish a CDISC URI scheme
Proposal: Build on http://data.gov.uk/resources/uris
Examples:
http://reference.data.cdisc.org/ct/sdtm/ROUTE#ORAL
or alt. using the C-code from NCI Thesaurus
http://reference.data.cdisc.org/ct/sdtm/C66729#C38288
2 Model CDISC SDTM CT:s as RDF Triples 3
Publish in RDF/XML
4
De-reference/look-up service so people
and applications can get descriptions of
individual code lists and code.
5
Create a so called SPARQL endpoint so that CDISC
standards published as triples can be queried
directly using the RDF query language.
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 36
37. Some thoughts and ideas explored
• Applying the four Linked Data principles
Linking Clinical Study Metadata
• A scenario
• Explore the use of
– BRIDG Domain Model (to be published in OWL/RDF)
– Translational Medicine Ontology - TMO
(a.k.a. Pharma Ontology)
Linking Clinical Data
• Best practice for URI scheme and minting URI:s
• Explorative work: using the Computer-Based
Patient Record (CPR) Ontology for Clinical Data
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 37
38. A scenario:
Linked Clinical Study Metadata
Internal
categorization to
support Design &
Interpretation
decisions
http://clinial.data.astrazeneca.com/id/study/D8180C00011
owl:sameAs
http://data.linkedct.org/resource/trials/NCT00755378
What would we like to see on a
What would we likewebpage
internal to see as the
linked data description of data
presenting linked it?
describing a clinical study?
More ideas
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 38
39. Explore the use of BRIDG and TMO
for common classes and properties
Semantic Web Health Care
and Life Sciences (HCLS)
Interest Group
Translational Medicine Ontology
(a.k.a. Pharma Ontology)
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 39
40. Linking Clinical Data: Explore URI
scheme and minting URI:s
Proposal: Build on http://data.gov.uk/resources/uris
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 40
41. Explorative work: using the Computer-Based
Patient Record (CPR) Ontology for Clinical Data
Acknowledgements:
Chimezie Ogbuji, Case Western Reserve University's Center for
Clinical Investigation, previously Cleveland Clinics.
Martin Agfjord, IT University, Göteborg
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 41
42. If you want to learn more
An Intro To The Semantic Web: Why You Need To http://data.gov.uk/linked-data
Know About It Sooner Than Later , by Samantha Wong
Image Source: Frederic Martin http://www.data.gov/semantic
The Semantic Web 2010 Status Update
by Prof .Jim Hendler
Open data: accountability, citizen utility
and economic opportunity.
Linked Spending Data –
How and Why Bother
Guide to the
Payments Ontology How DBpedia Treats Linked Open Data star scheme by example
Wikipedia as a Database
Presentation: Linking Open Drug Data
Statistical Data in RDF Interest Group for Semantic Web in
Health Care and Life Science
The RDF Data Cube http://www.w3.org/blog/hcls
vocabulary
The Linking Open Data
cloud diagram
Excellent article “More than Words: Biomedical Ontologies”
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 42
43. And now – An Evening with TinTin
at the Brussels Comic Strip Center!
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond 43