The document discusses the potential for using linked data principles to help address complexity in pharmaceutical research and development. It provides examples of existing linked data projects like DBpedia, Wikidata, and Schema.org and describes how applying semantic web standards like RDF, RDF schemas, and SKOS can help organize disparate data sources. The principles of linked data - using URIs, HTTP URIs, providing useful information at URIs, and including links between URIs - are outlined. Existing linked open data clouds and enterprise implementations are briefly noted as encouragement for applying linked data to improve associations between data and enable greater automation in life sciences and healthcare.
1. Linked Data in Pharmaceutical R&D
PhD Informatics Seminar, IT University
Kerstin Forsberg
AstraZeneca, R&D, Sweden
Clinical Information Strategy
kerstin.l.forsberg@astrazeneca.com
kerfors on Twitter, Google+, LinkedIn, Slideshare, Blogspot, citulike
3. Some recent headlines!
How DBpedia treats Wikipedia as a Database
Wikipedia’s Next Big Thing: Wikidata, A Machine-
Readable, User-Editable Database Funded By
Google, Paul Allen And Others
Yandex (Russia’s leading search engine) joins
Google, Yahoo! and Bing to collaborate on
Schema.org
4. What is this?
http://dbpedia.org/resource/IT_University
http://dbpedia.org/resource/Stockholm
http://education.data.gov.uk/id/school/123065
http://schema.org/CollegeOrUniversity
http://research.data.astrazeneca.com/id/clinicalstudy/D5890C00003
http://linkedct.org/resource/trial/nct00244608/
6. Complexity
Pathophysiology?
Phenotypes? Targets?
Costs? Biomarkers?
QoL? Outcomes?
Association and interpretation Health care, pharma, academia,
of all data needed authorities and payers.
has become a too complex task Shared datasets
for individuals, or even teams to handle. Different decisions and different types
of applications.
7. Why is it so hard …
See slide 1-5 in the slide pack from Open PHACTS
presented at BioIT World Expo Europe – Oct 2011
by Prof. Carole Goble on SlideShare
http://www.slideshare.net/open_phacts/open-phacts-bioit-world-europe-cag-111013
8. Web of Documents
Web 3.0
Web of (Linked) Data An Intro To The Semantic Web: Why You Need To Know
About It Sooner Than Later , by Samantha Wong
Image Source: Frederic Martin
9. Opportunities
Organized for associations
Prepared for not yet defined use
Ready for automation where computers can
function alongside us to
Mitigate the complexity in discovering, accessing,
connecting and interpreting information
Improve the productivity in managing
information
11. RDF Triples
Resource Description Framework (RDF):
a general model of how any piece of data and
representations of knowledge can be expressed as
so called triples.
subject predicate object (or value)
Stockholm type place
Stockholm capital Sweden
Stockholm subject Port cities in Sweden
Stockholm areaCode “+46-8”
“http://en.wikipedia.org/wiki/Stockholm”
Stockholm primaryTopic
12. RDF Triples
Triples can be aggregated into graphs with subject
and objects as nodes, and predicates as arcs.
type place
capital Sweden
Stockholm subject Port cities in Sweden
areaCode “+46-8”
“http://en.wikipedia.org/wiki/Stockholm”
primaryTopic
13. RDF Triples
Graphs of triples can be extended across different
sources and for different purpose.
type place
type Country
CDISC
capital Sweden
Stockholm subject Port cities in Sweden subject
CDISC
Interchange
EU 2012 areaCode “+46-8” Gothenburg
“http://en.wikipedia.org/wiki/Stockholm”
primaryTopic
14. RDF Triples
RDF Schema and the RDF based Web Ontology
Language (OWL) add a typing mechanism to classify
subjects and objects into hierarchies of types
Thing subClass Place
subClass
subClass
Organization Event
subClass Adm.Area
subClass
type subClass
Business
type place
Event type Country
CDISC
capital Sweden
type
Stockholm subject Port cities in Sweden subject
CDISC
Interchange
EU 2012 areaCode “+46-8” Gothenburg
“http://en.wikipedia.org/wiki/Stockholm”
primaryTopic
15. RDF Triples
Simple Knowledge Organization System (SKOS) is
a thin RDF based vocabulary that can be used to
build terminologies of broader/narrower concepts.
Populated
places in Europe
Organization broader narrower
type Cities in
Business
type place Sweden
Event
CDISC broader
capital Sweden narrower
type
Stockholm subject Port cities in Sweden subject
CDISC
Interchange
EU 2012 areaCode “+46-8” Gothenburg
“http://en.wikipedia.org/wiki/Stockholm”
primaryTopic
16. 4 Principles for Linked Data …
… and 5 stars for Linked Open Data
• Use URIs (Uniform Resource Identifiers) as
names for things.
• Use HTTP URIs so that people can look up
(dereference) those names.
• When someone looks up a URI, provide
useful information.
• Include links to other URIs so that they can
discover more things.
Source: Linked Open Data star scheme by example
More resources introducing and describing the Linked Data idea
17. Linked Open Data cloud
Richard Cyganiak and Anja Jentzsch
http://lod-cloud.net/
Growing Linked Open Data Cloud
http://youtu.be/TXFYSWuEOOw
18. Linked Enterprise Data
Source: What does Open Data mean for Enterprises?
More resources introducing and describing the Linked Data idea
19. I’m encouraged by …
• … what actually can be done by applying Linked
Data principles, together with a stepwise
implementation and pragmatic application of
crucial building blocks, to …
Health Care and Life Sciences (HCLS)
Interest Group
• … improve the research and commercial Linking Open Drug Data
utility of information
• Organized for associations
• Prepared for not yet defined use EU project The Large Knowledge Collider
Linked Life Data
• Ready for automation where computers can
function alongside us to A 2-page summary of our learnings from
participating in these external projects:
Mitigate the complexity in discovering, Linked Data in Pharma, 2011, Bo Andersson
and Kerstin Forsberg
accessing, connecting and interpreting
information
Improve the productivity in managing
1 information
9