SlideShare une entreprise Scribd logo
1  sur  90
Technologies, methods and challengesto effective public data sharing and aggregation Mark D Wilkinson Medical Genetics, UBC PI Bioinformatics Heart + Lung Institute at St. Paul’s Hospital markw@illuminae.com http://wilkinsonlab.ca
Thanks in advance(very few of these ideas are my own!) Paul Gordon – Sun Center of Excellence, U Calgary Carole Goble – University of Manchester Charles Petrie – Stanford University
We’ve come a long way!!
In the beginning...
Link Integration
Integration of What?
Specially-formatted Files EMBL Format
Specially-formatted Files GenBankFormat
Specially-formatted Files FASTA Format
Specially-formatted Files GCG Format
At least 20 different formats for                  representing DNA sequences... Lord et al. 2004
Many formats contained a wide variety of related, but different, information DNA sequence Sequence features Translation Date/time/method Publication cross-references ... ...
Each file format required it’s own parser...
...and that problem wasn’t limited to DNA...
XML
What did XML do for us? “...advent of XML meant that we didn’t  have to write our own parsers anymore...” Individual data elements in a file can be automatically located and extracted
Predictable way to represent data Makes it easier for machines to encode/extract
EMBL Record for BRCA1 In XML
GenBank Record for BRCA1 In XML
So now we can share and  aggregate data!
So now we can share and  aggregate data!
EMBL Record for BRCA1 In XML GenBank Record for BRCA1 In XML
So now we can share and  aggregate data! ...because it isn’t (just) a parsing problem... Various resources have various data models
So... Let’s find a way to describe the data models!
XML Schema
XML Schema There will be an element called “qualifier” It will have an attribute called “name” The content of that attribute will betext There will be a child element called “value” The content of that child element will be free-text XML Schema There will be an element called “GBQualifier” There will be a child element called “GBQualifier_name” The content of that child element will be free-text There will be a child element called “GBQualifier_value” The content of that child element will be free-text
So now we can share and  aggregate data!
So now we can share and  aggregate data!
What did XML Schema do for us? “...XML Schema (among other things) allowed  us to ~automate the creation of (in-memory)  Structures which could hold the given  XML-formatted data...”
Does not solve the integration or aggregation problem
XML Schema There will be an element called “qualifier” It will have an attribute called “name” The content of that attribute will betext There will be a child element called “value” The content of that child element will be free-text XML Schema There will be an element called “GBQualifier” There will be a child element called “GBQualifier_name” The content of that child element will be free-text There will be a child element called “GBQualifier_value” The content of that child element will be free-text
XML Schema There will be an element called “qualifier” It will have an attribute called “name” The content of that attribute will betext There will be a child element called “value” The content of that child element will be free-text Because the “meaning” of eachelement is implicit, we resort to “Schema Mapping”  to integrate the data XML Schema There will be an element called “GBQualifier” There will be a child element called “GBQualifier_name” The content of that child element will be free-text There will be a child element called “GBQualifier_value” The content of that child element will be free-text
XML Schema There will be an element called “qualifier” It will have an attribute called “name” The content of that attribute will betext There will be a child element called “value” The content of that child element will be free-text XML Schema There will be an element called “GBQualifier” There will be a child element called “GBQualifier_name” The content of that child element will be free-text There will be a child element called “GBQualifier_value” The content of that child element will be free-text
XML Schema There will be an element called “qualifier” It will have an attribute called “name” The content of that attribute will betext There will be a child element called “value” The content of that child element will be free-text XML Schema There will be an element called “GBQualifier” There will be a child element called “GBQualifier_name” The content of that child element will be free-text There will be a child element called “GBQualifier_value” The content of that child element will be free-text
XML Schema There will be an element called “qualifier” It will have an attribute called “name” The content of that attribute will betext There will be a child element called “value” The content of that child element will be free-text Automatically Map Schema, thenwe can integrate data! XML Schema There will be an element called “GBQualifier” There will be a child element called “GBQualifier_name” The content of that child element will be free-text There will be a child element called “GBQualifier_value” The content of that child element will be free-text
XML Schema There will be an element called “qualifier” It will have an attribute called “name” The content of that attribute will betext There will be a child element called “value” The content of that child element will be free-text XML Schema There will be an element called “GBQualifier” There will be a child element called “GBQualifier_name” The content of that child element will be free-text There will be a child element called “GBQualifier_value” The content of that child element will be free-text
Nevertheless...
Web Services “Service Oriented Architectures” WSDL  (and many other 4-letter words)
Web Services & SOA’s Allow you to expose software  (e.g. a database, analytical tool, or service) on the Web so that others can use it(in their own analytical pipelines)
Excellent!!
But...
XML Schema
XML Schema There will be an element called “qualifier” It will have an attribute called “name” The content of that attribute will betext There will be a child attribute called “value” The content of that child attribute will be free-text XML Schema There will be an element called “GBQualifier” There will be a child attribute called “GBQualifier_name” The content of that child attribute will be free-text There will be a child attribute called “GBQualifier_value” The content of that child attribute will be free-text
“The phrase ‘practical Web Services’    is not intrinsically an oxymoron,    but [I] argue that there are    few in existence.”
Why?
Because this problem is so disruptivethat there is little point in building “public” Web Services...   They are simply too difficult to integrate with other “public” Web Services. -- adapted from Petrie,  SWSIP 2009 XML Schema There will be an element called “qualifier” It will have an attribute called “name” The content of that attribute will betext There will be a child attribute called “value” The content of that child attribute will be free-text XML Schema There will be an element called “GBQualifier” There will be a child attribute called “GBQualifier_name” The content of that child attribute will be free-text There will be a child attribute called “GBQualifier_value” The content of that child attribute will be free-text
...and that’s pretty muchwhere the world is right now...
But there is hope!
“Linked Data” movement Resource Description Framework “RDF” Two new technologies & communities The “Semantic Web” movement Web Ontology Language “OWL” (+ RDF)
What does RDF do for us? “...RDF replaces XML Schema, because RDF  says that there is only one data model...”
What does OWL do for us? XML Schema There will be an element called “qualifier” It will have an attribute called “name” The content of that attribute will betext There will be a child attribute called “value” The content of that child attribute will be free-text “...the semantics are no longer implicit  in the data model...” XML Schema There will be an element called “GBQualifier” There will be a child attribute called “GBQualifier_name” The content of that child attribute will be free-text There will be a child attribute called “GBQualifier_value” The content of that child attribute will be free-text
So what?
The Semantic Web  Gives us the opportunity to re-thinkhow we build our health data infrastructures
The Semantic Web  isn’t  “yet another layer of technology”
The Semantic Web  changes the way we write software
The Semantic Web  What to do & How to do it is no longer encoded in your software
The Semantic Web  What to do & How to do it is part of the data
The Semantic Web  What to do & How to do itis part of a shared, expert understanding
The Semantic Web  What to do & How to do it IS* PERSONAL! * can be...
One piece of software Any question... Any answer
Let me demonstrate what I mean
Semantic Automated Discovery and Integrationhttp://sadiframework.org MicrosoftResearch Founding partner
Semantic Health And Research Environment(a Semantic Web question answerer...)
Example #1 Show me the latest Blood Urea Nitrogen and Creatinine levelsof patients who appear to be rejecting their transplants SELECT ?patient ?bun ?creat FROM <http://sadiframework.org/ontologies/patients.rdf> WHERE { 	?patientrdf:typepatient:LikelyRejecter . 	?patient l:latestBUN ?bun .  	?patient l:latestCreatinine ?creat .  }
Likely Rejecter: A patient who has creatinine levelsthat are increasing over time - - Wilkinson “MD”
Likely Rejecter: …but there is no “likely rejecter” column or table in our database…
Likely Rejecter: Our database contains various blood chemistry measurementsat various time-points
SHARE determines by itselfthe need to do a Linear Regression analysis over Creatinine blood chemistry measurements Semantics
SHARE determines by itselfhow and wherethat analysiscan be doneand does it Semantics
The SHARE system utilizes Semantics (via SADI) to discover and access analytical services on the Web that do linear regression analysis
VOILA!
Neither SADI nor SHAREknow anything aboutblood chemistry, or mathematics
Example #2 From a (contrived) integrated dataset,  retrieve the blood pressure measurements SELECT ?output ?unit ?value  FROM <http://es-01.chibi.ubc.ca/~soroush/framingham/sbpfeb.owl> WHERE {      ?output rdf:typesbp:BloodPressure .      ?output local:hasCanonicalAttribute ?pr .     ?pr sio:SIO_000221 ?unit .     ?pr sio:SIO_000300 ?value . }
This should be extremely straightforward...
...except for one problem...
 <owl:NamedIndividualrdf:about="http://es-01.chibi.ubc.ca/~soroush/framingham/sbpfeb.owl#pressureinstance1">         <rdf:typerdf:resource="&galen;SystolicBloodPressure"/>         <resource:SIO_000300>0.137</resource:SIO_000300>         <resource:SIO_000221 rdf:resource="&ucum;unit/pressure/meter-of-mercury-column"/>     </owl:NamedIndividual>     <owl:NamedIndividualrdf:about="http://es-01.chibi.ubc.ca/~soroush/framingham/sbpfeb.owl#pressureinstance2">         <rdf:typerdf:resource="&galen;SystolicBloodPressure"/>         <resource:SIO_000300>12.45</resource:SIO_000300>         <resource:SIO_000221 rdf:resource="http://es-01.chibi.ubc.ca/~soroush/framingham/sbpfeb.owl#centi-meter-of-mercury-column"/>     </owl:NamedIndividual>     <owl:NamedIndividualrdf:about="http://es-01.chibi.ubc.ca/~soroush/framingham/sbpfeb.owl#pressureinstance3">         <rdf:typerdf:resource="&galen;SystolicBloodPressure"/>         <resource:SIO_000300>5.3</resource:SIO_000300>         <resource:SIO_000221 rdf:resource="&ucum;unit/pressure/inch-of-mercury-column"/>     </owl:NamedIndividual>
Spot the problem?
Let me help...
<owl:NamedIndividualrdf:about="http://es-01.chibi.ubc.ca/~soroush/framingham/sbpfeb.owl#pressureinstance1">         <rdf:typerdf:resource="&galen;SystolicBloodPressure"/>         <resource:SIO_000300>0.137</resource:SIO_000300>         <resource:SIO_000221 rdf:resource="&ucum;unit/pressure/meter-of-mercury-column"/>     </owl:NamedIndividual>     <owl:NamedIndividualrdf:about="http://es-01.chibi.ubc.ca/~soroush/framingham/sbpfeb.owl#pressureinstance2">         <rdf:typerdf:resource="&galen;SystolicBloodPressure"/>         <resource:SIO_000300>12.45</resource:SIO_000300>         <resource:SIO_000221 rdf:resource=“&ucum;unit/framingham/sbpfeb.owl#centi-meter-of-mercury-column"/>     </owl:NamedIndividual>     <owl:NamedIndividualrdf:about="http://es-01.chibi.ubc.ca/~soroush/framingham/sbpfeb.owl#pressureinstance3">         <rdf:typerdf:resource="&galen;SystolicBloodPressure"/>         <resource:SIO_000300>5.3</resource:SIO_000300>         <resource:SIO_000221 rdf:resource="&ucum;unit/pressure/inch-of-mercury-column"/>     </owl:NamedIndividual>
Example #2 From a (contrived) integrated dataset,  retrieve the blood pressure measurements Semantics SELECT ?output ?unit ?value  FROM <http://es-01.chibi.ubc.ca/~soroush/framingham/sbpfeb.owl> WHERE {      ?output rdf:typesbp:BloodPressure .      ?output local:hasCanonicalAttribute ?pr .     ?pr sio:SIO_000221 ?unit .     ?pr sio:SIO_000300 ?value . } My semantic definition of “Blood Pressure”includes the units that I want...
Semantics This is enough to trigger SHARE to automatically discover an online unit-conversion service...
Neither SADI nor SHAREknow anything aboutunits or unit conversions
Many of the challenges  to data aggregation and sharingnow have solutions that work!
What, in my opinion, is the greatest remaining challenge?
To a biologist...       ...“data mining” means “this data is mine!”
The challenge to us all Move from Data Mine-ing To Data Ours-ing -- Len Silverston, 2007
XML We’ve come a long way!! XML Schema There will be an element called “qualifier” It will have an attribute called “name” The content of that attribute will betext There will be a child attribute called “value” The content of that child attribute will be free-text XML Schema There will be an element called “GBQualifier” There will be a child attribute called “GBQualifier_name” The content of that child attribute will be free-text There will be a child attribute called “GBQualifier_value” The content of that child attribute will be free-text
TEAM: Luke McCarthyBenjamin Vandervalk SoroushSamadian Microsoft Research

Contenu connexe

En vedette

Portfolio PlusAnimations 2009 NL
Portfolio PlusAnimations 2009 NLPortfolio PlusAnimations 2009 NL
Portfolio PlusAnimations 2009 NL
rogiervanmeeuwen
 
Smart brief content marketing trifecta
Smart brief content marketing trifectaSmart brief content marketing trifecta
Smart brief content marketing trifecta
SmartBrief
 
Barisan Pentadbir SMKTP(AA)
Barisan Pentadbir SMKTP(AA)Barisan Pentadbir SMKTP(AA)
Barisan Pentadbir SMKTP(AA)
hajahrokiah
 
Ryan Lost In Rainforest
Ryan Lost In RainforestRyan Lost In Rainforest
Ryan Lost In Rainforest
nat014
 
SmartBrief Portfolio
SmartBrief PortfolioSmartBrief Portfolio
SmartBrief Portfolio
SmartBrief
 
Building a Mega Community with PressWork
Building a Mega Community with PressWorkBuilding a Mega Community with PressWork
Building a Mega Community with PressWork
Brendan Sera-Shriar
 

En vedette (20)

Apache sirona
Apache sironaApache sirona
Apache sirona
 
Portfolio PlusAnimations 2009 NL
Portfolio PlusAnimations 2009 NLPortfolio PlusAnimations 2009 NL
Portfolio PlusAnimations 2009 NL
 
It’s a WIN, WIN: ‘WordPress On Windows’
It’s a WIN, WIN: ‘WordPress On Windows’It’s a WIN, WIN: ‘WordPress On Windows’
It’s a WIN, WIN: ‘WordPress On Windows’
 
Smart brief content marketing trifecta
Smart brief content marketing trifectaSmart brief content marketing trifecta
Smart brief content marketing trifecta
 
La graviola
La graviolaLa graviola
La graviola
 
Gene Search
Gene SearchGene Search
Gene Search
 
Sin ropa
Sin ropaSin ropa
Sin ropa
 
PAPER 2 2010
PAPER 2 2010PAPER 2 2010
PAPER 2 2010
 
WPAN
WPANWPAN
WPAN
 
Barisan Pentadbir SMKTP(AA)
Barisan Pentadbir SMKTP(AA)Barisan Pentadbir SMKTP(AA)
Barisan Pentadbir SMKTP(AA)
 
Leveraging trade associations
Leveraging trade associationsLeveraging trade associations
Leveraging trade associations
 
Ryan Lost In Rainforest
Ryan Lost In RainforestRyan Lost In Rainforest
Ryan Lost In Rainforest
 
Part 1: Lean Clinical Workplace Design
Part 1: Lean Clinical Workplace DesignPart 1: Lean Clinical Workplace Design
Part 1: Lean Clinical Workplace Design
 
Using Semantics to personalize medical research
Using Semantics to personalize medical researchUsing Semantics to personalize medical research
Using Semantics to personalize medical research
 
Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...
Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...
Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...
 
CDIS DR. FSM
CDIS    DR. FSMCDIS    DR. FSM
CDIS DR. FSM
 
SmartBrief Portfolio
SmartBrief PortfolioSmartBrief Portfolio
SmartBrief Portfolio
 
Tomcat Maven Plugin
Tomcat Maven PluginTomcat Maven Plugin
Tomcat Maven Plugin
 
Making the Most of Plug-ins - WordCamp Toronto 2008
Making the Most of Plug-ins - WordCamp Toronto 2008Making the Most of Plug-ins - WordCamp Toronto 2008
Making the Most of Plug-ins - WordCamp Toronto 2008
 
Building a Mega Community with PressWork
Building a Mega Community with PressWorkBuilding a Mega Community with PressWork
Building a Mega Community with PressWork
 

Similaire à Technologies, methods and challenges to data sharing and aggrigation

JavaScript APIs you’ve never heard of (and some you have)
JavaScript APIs you’ve never heard of (and some you have)JavaScript APIs you’ve never heard of (and some you have)
JavaScript APIs you’ve never heard of (and some you have)
Nicholas Zakas
 
Data Feed SEO for Affiliates by Will Critchlow
Data Feed SEO for Affiliates by Will CritchlowData Feed SEO for Affiliates by Will Critchlow
Data Feed SEO for Affiliates by Will Critchlow
auexpo Conference
 
Introduction to Linked Data 1/5
Introduction to Linked Data 1/5Introduction to Linked Data 1/5
Introduction to Linked Data 1/5
Juan Sequeda
 

Similaire à Technologies, methods and challenges to data sharing and aggrigation (20)

SADI: A design-pattern for “native” Linked-Data Semantic Web Services
SADI: A design-pattern for “native” Linked-Data Semantic Web ServicesSADI: A design-pattern for “native” Linked-Data Semantic Web Services
SADI: A design-pattern for “native” Linked-Data Semantic Web Services
 
Content Management for Publishers
Content Management for PublishersContent Management for Publishers
Content Management for Publishers
 
NLP for entity-based and semantic SEO - Contference.pptx
NLP for entity-based and semantic SEO - Contference.pptxNLP for entity-based and semantic SEO - Contference.pptx
NLP for entity-based and semantic SEO - Contference.pptx
 
JavaScript APIs you’ve never heard of (and some you have)
JavaScript APIs you’ve never heard of (and some you have)JavaScript APIs you’ve never heard of (and some you have)
JavaScript APIs you’ve never heard of (and some you have)
 
CIRCUIT 2015 - Content API's For AEM Sites
CIRCUIT 2015 - Content API's For AEM SitesCIRCUIT 2015 - Content API's For AEM Sites
CIRCUIT 2015 - Content API's For AEM Sites
 
Semantic Publishing and Entity SEO - Conteference 20-11-2022
Semantic Publishing and Entity SEO - Conteference 20-11-2022Semantic Publishing and Entity SEO - Conteference 20-11-2022
Semantic Publishing and Entity SEO - Conteference 20-11-2022
 
Securing and Personalizing Commerce Using Identity Data Mining
Securing and Personalizing Commerce Using Identity Data MiningSecuring and Personalizing Commerce Using Identity Data Mining
Securing and Personalizing Commerce Using Identity Data Mining
 
Evolve 19 | Paul Legan | Going Beyond Metadata: Extracting Meaningful Informa...
Evolve 19 | Paul Legan | Going Beyond Metadata: Extracting Meaningful Informa...Evolve 19 | Paul Legan | Going Beyond Metadata: Extracting Meaningful Informa...
Evolve 19 | Paul Legan | Going Beyond Metadata: Extracting Meaningful Informa...
 
Java script
Java scriptJava script
Java script
 
What book and journal publishers need to know to get accessibility right
What book and journal publishers need to know to get accessibility rightWhat book and journal publishers need to know to get accessibility right
What book and journal publishers need to know to get accessibility right
 
Data Feed SEO for Affiliates by Will Critchlow
Data Feed SEO for Affiliates by Will CritchlowData Feed SEO for Affiliates by Will Critchlow
Data Feed SEO for Affiliates by Will Critchlow
 
Uso di Schema.org per il tuo sito web
Uso di Schema.org per il tuo sito webUso di Schema.org per il tuo sito web
Uso di Schema.org per il tuo sito web
 
RDFa Semantic Web
RDFa Semantic WebRDFa Semantic Web
RDFa Semantic Web
 
Can’t Find Your 404s?
Can’t Find Your 404s?Can’t Find Your 404s?
Can’t Find Your 404s?
 
DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0
 
Learning About JavaScript (…and its little buddy, JQuery!)
Learning About JavaScript (…and its little buddy, JQuery!)Learning About JavaScript (…and its little buddy, JQuery!)
Learning About JavaScript (…and its little buddy, JQuery!)
 
Dynamic Content in QTP
Dynamic Content in QTPDynamic Content in QTP
Dynamic Content in QTP
 
Winning SEO Using Schema Markup and Structured Data
Winning SEO Using Schema Markup and Structured DataWinning SEO Using Schema Markup and Structured Data
Winning SEO Using Schema Markup and Structured Data
 
Drupal & Summon: Keeping Article Discovery in the Library
Drupal & Summon: Keeping Article Discovery in the LibraryDrupal & Summon: Keeping Article Discovery in the Library
Drupal & Summon: Keeping Article Discovery in the Library
 
Introduction to Linked Data 1/5
Introduction to Linked Data 1/5Introduction to Linked Data 1/5
Introduction to Linked Data 1/5
 

Plus de Mark Wilkinson

Sample data and other ur ls
Sample data and other ur lsSample data and other ur ls
Sample data and other ur ls
Mark Wilkinson
 

Plus de Mark Wilkinson (20)

FAIR Metrics - Presentation to NIH KC1
FAIR Metrics - Presentation to NIH KC1FAIR Metrics - Presentation to NIH KC1
FAIR Metrics - Presentation to NIH KC1
 
Introducing the fair evaluator
Introducing the fair evaluatorIntroducing the fair evaluator
Introducing the fair evaluator
 
FAIR Projector Builder
FAIR Projector BuilderFAIR Projector Builder
FAIR Projector Builder
 
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
 
smartAPIs: EUDAT Semantic Working Group Presentation @ RDA 9th Plenary
smartAPIs:  EUDAT Semantic Working Group Presentation @ RDA 9th PlenarysmartAPIs:  EUDAT Semantic Working Group Presentation @ RDA 9th Plenary
smartAPIs: EUDAT Semantic Working Group Presentation @ RDA 9th Plenary
 
IBC FAIR Data Prototype Implementation slideshow
IBC FAIR Data Prototype Implementation   slideshowIBC FAIR Data Prototype Implementation   slideshow
IBC FAIR Data Prototype Implementation slideshow
 
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
 
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015
 
Sample data and other ur ls
Sample data and other ur lsSample data and other ur ls
Sample data and other ur ls
 
Example code for the SADI BMI Calculator Web Service
Example code for the SADI BMI Calculator Web ServiceExample code for the SADI BMI Calculator Web Service
Example code for the SADI BMI Calculator Web Service
 
Sadi service
Sadi serviceSadi service
Sadi service
 
Tutorial - Creating SADI semantic-web-services
Tutorial - Creating SADI semantic-web-servicesTutorial - Creating SADI semantic-web-services
Tutorial - Creating SADI semantic-web-services
 
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
 
Force11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, OxfordForce11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, Oxford
 
Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014
 
Enhancing Reproducibility and Transparency in Clinical Research through Seman...
Enhancing Reproducibility and Transparency in Clinical Research through Seman...Enhancing Reproducibility and Transparency in Clinical Research through Seman...
Enhancing Reproducibility and Transparency in Clinical Research through Seman...
 
SADI CSHALS 2013
SADI CSHALS 2013SADI CSHALS 2013
SADI CSHALS 2013
 
Web Science 2.0 - in silico science
Web Science 2.0 - in silico scienceWeb Science 2.0 - in silico science
Web Science 2.0 - in silico science
 
Web Science - ISoLA 2012
Web Science - ISoLA 2012Web Science - ISoLA 2012
Web Science - ISoLA 2012
 
Web Science, SADI, and the Singularity
Web Science, SADI, and the SingularityWeb Science, SADI, and the Singularity
Web Science, SADI, and the Singularity
 

Dernier

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Dernier (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Technologies, methods and challenges to data sharing and aggrigation

  • 1. Technologies, methods and challengesto effective public data sharing and aggregation Mark D Wilkinson Medical Genetics, UBC PI Bioinformatics Heart + Lung Institute at St. Paul’s Hospital markw@illuminae.com http://wilkinsonlab.ca
  • 2. Thanks in advance(very few of these ideas are my own!) Paul Gordon – Sun Center of Excellence, U Calgary Carole Goble – University of Manchester Charles Petrie – Stanford University
  • 3. We’ve come a long way!!
  • 6.
  • 12. At least 20 different formats for representing DNA sequences... Lord et al. 2004
  • 13. Many formats contained a wide variety of related, but different, information DNA sequence Sequence features Translation Date/time/method Publication cross-references ... ...
  • 14. Each file format required it’s own parser...
  • 15. ...and that problem wasn’t limited to DNA...
  • 16. XML
  • 17. What did XML do for us? “...advent of XML meant that we didn’t have to write our own parsers anymore...” Individual data elements in a file can be automatically located and extracted
  • 18. Predictable way to represent data Makes it easier for machines to encode/extract
  • 19. EMBL Record for BRCA1 In XML
  • 20. GenBank Record for BRCA1 In XML
  • 21. So now we can share and aggregate data!
  • 22. So now we can share and aggregate data!
  • 23. EMBL Record for BRCA1 In XML GenBank Record for BRCA1 In XML
  • 24. So now we can share and aggregate data! ...because it isn’t (just) a parsing problem... Various resources have various data models
  • 25. So... Let’s find a way to describe the data models!
  • 27. XML Schema There will be an element called “qualifier” It will have an attribute called “name” The content of that attribute will betext There will be a child element called “value” The content of that child element will be free-text XML Schema There will be an element called “GBQualifier” There will be a child element called “GBQualifier_name” The content of that child element will be free-text There will be a child element called “GBQualifier_value” The content of that child element will be free-text
  • 28. So now we can share and aggregate data!
  • 29. So now we can share and aggregate data!
  • 30. What did XML Schema do for us? “...XML Schema (among other things) allowed us to ~automate the creation of (in-memory) Structures which could hold the given XML-formatted data...”
  • 31. Does not solve the integration or aggregation problem
  • 32. XML Schema There will be an element called “qualifier” It will have an attribute called “name” The content of that attribute will betext There will be a child element called “value” The content of that child element will be free-text XML Schema There will be an element called “GBQualifier” There will be a child element called “GBQualifier_name” The content of that child element will be free-text There will be a child element called “GBQualifier_value” The content of that child element will be free-text
  • 33. XML Schema There will be an element called “qualifier” It will have an attribute called “name” The content of that attribute will betext There will be a child element called “value” The content of that child element will be free-text Because the “meaning” of eachelement is implicit, we resort to “Schema Mapping” to integrate the data XML Schema There will be an element called “GBQualifier” There will be a child element called “GBQualifier_name” The content of that child element will be free-text There will be a child element called “GBQualifier_value” The content of that child element will be free-text
  • 34. XML Schema There will be an element called “qualifier” It will have an attribute called “name” The content of that attribute will betext There will be a child element called “value” The content of that child element will be free-text XML Schema There will be an element called “GBQualifier” There will be a child element called “GBQualifier_name” The content of that child element will be free-text There will be a child element called “GBQualifier_value” The content of that child element will be free-text
  • 35. XML Schema There will be an element called “qualifier” It will have an attribute called “name” The content of that attribute will betext There will be a child element called “value” The content of that child element will be free-text XML Schema There will be an element called “GBQualifier” There will be a child element called “GBQualifier_name” The content of that child element will be free-text There will be a child element called “GBQualifier_value” The content of that child element will be free-text
  • 36. XML Schema There will be an element called “qualifier” It will have an attribute called “name” The content of that attribute will betext There will be a child element called “value” The content of that child element will be free-text Automatically Map Schema, thenwe can integrate data! XML Schema There will be an element called “GBQualifier” There will be a child element called “GBQualifier_name” The content of that child element will be free-text There will be a child element called “GBQualifier_value” The content of that child element will be free-text
  • 37. XML Schema There will be an element called “qualifier” It will have an attribute called “name” The content of that attribute will betext There will be a child element called “value” The content of that child element will be free-text XML Schema There will be an element called “GBQualifier” There will be a child element called “GBQualifier_name” The content of that child element will be free-text There will be a child element called “GBQualifier_value” The content of that child element will be free-text
  • 39. Web Services “Service Oriented Architectures” WSDL (and many other 4-letter words)
  • 40. Web Services & SOA’s Allow you to expose software (e.g. a database, analytical tool, or service) on the Web so that others can use it(in their own analytical pipelines)
  • 44. XML Schema There will be an element called “qualifier” It will have an attribute called “name” The content of that attribute will betext There will be a child attribute called “value” The content of that child attribute will be free-text XML Schema There will be an element called “GBQualifier” There will be a child attribute called “GBQualifier_name” The content of that child attribute will be free-text There will be a child attribute called “GBQualifier_value” The content of that child attribute will be free-text
  • 45. “The phrase ‘practical Web Services’ is not intrinsically an oxymoron, but [I] argue that there are few in existence.”
  • 46. Why?
  • 47. Because this problem is so disruptivethat there is little point in building “public” Web Services... They are simply too difficult to integrate with other “public” Web Services. -- adapted from Petrie, SWSIP 2009 XML Schema There will be an element called “qualifier” It will have an attribute called “name” The content of that attribute will betext There will be a child attribute called “value” The content of that child attribute will be free-text XML Schema There will be an element called “GBQualifier” There will be a child attribute called “GBQualifier_name” The content of that child attribute will be free-text There will be a child attribute called “GBQualifier_value” The content of that child attribute will be free-text
  • 48. ...and that’s pretty muchwhere the world is right now...
  • 49. But there is hope!
  • 50. “Linked Data” movement Resource Description Framework “RDF” Two new technologies & communities The “Semantic Web” movement Web Ontology Language “OWL” (+ RDF)
  • 51. What does RDF do for us? “...RDF replaces XML Schema, because RDF says that there is only one data model...”
  • 52. What does OWL do for us? XML Schema There will be an element called “qualifier” It will have an attribute called “name” The content of that attribute will betext There will be a child attribute called “value” The content of that child attribute will be free-text “...the semantics are no longer implicit in the data model...” XML Schema There will be an element called “GBQualifier” There will be a child attribute called “GBQualifier_name” The content of that child attribute will be free-text There will be a child attribute called “GBQualifier_value” The content of that child attribute will be free-text
  • 54. The Semantic Web Gives us the opportunity to re-thinkhow we build our health data infrastructures
  • 55. The Semantic Web isn’t “yet another layer of technology”
  • 56. The Semantic Web changes the way we write software
  • 57. The Semantic Web What to do & How to do it is no longer encoded in your software
  • 58. The Semantic Web What to do & How to do it is part of the data
  • 59. The Semantic Web What to do & How to do itis part of a shared, expert understanding
  • 60. The Semantic Web What to do & How to do it IS* PERSONAL! * can be...
  • 61. One piece of software Any question... Any answer
  • 62. Let me demonstrate what I mean
  • 63. Semantic Automated Discovery and Integrationhttp://sadiframework.org MicrosoftResearch Founding partner
  • 64. Semantic Health And Research Environment(a Semantic Web question answerer...)
  • 65. Example #1 Show me the latest Blood Urea Nitrogen and Creatinine levelsof patients who appear to be rejecting their transplants SELECT ?patient ?bun ?creat FROM <http://sadiframework.org/ontologies/patients.rdf> WHERE { ?patientrdf:typepatient:LikelyRejecter . ?patient l:latestBUN ?bun . ?patient l:latestCreatinine ?creat . }
  • 66. Likely Rejecter: A patient who has creatinine levelsthat are increasing over time - - Wilkinson “MD”
  • 67. Likely Rejecter: …but there is no “likely rejecter” column or table in our database…
  • 68. Likely Rejecter: Our database contains various blood chemistry measurementsat various time-points
  • 69. SHARE determines by itselfthe need to do a Linear Regression analysis over Creatinine blood chemistry measurements Semantics
  • 70. SHARE determines by itselfhow and wherethat analysiscan be doneand does it Semantics
  • 71. The SHARE system utilizes Semantics (via SADI) to discover and access analytical services on the Web that do linear regression analysis
  • 73. Neither SADI nor SHAREknow anything aboutblood chemistry, or mathematics
  • 74. Example #2 From a (contrived) integrated dataset, retrieve the blood pressure measurements SELECT ?output ?unit ?value FROM <http://es-01.chibi.ubc.ca/~soroush/framingham/sbpfeb.owl> WHERE { ?output rdf:typesbp:BloodPressure . ?output local:hasCanonicalAttribute ?pr . ?pr sio:SIO_000221 ?unit . ?pr sio:SIO_000300 ?value . }
  • 75. This should be extremely straightforward...
  • 76. ...except for one problem...
  • 77. <owl:NamedIndividualrdf:about="http://es-01.chibi.ubc.ca/~soroush/framingham/sbpfeb.owl#pressureinstance1"> <rdf:typerdf:resource="&galen;SystolicBloodPressure"/> <resource:SIO_000300>0.137</resource:SIO_000300> <resource:SIO_000221 rdf:resource="&ucum;unit/pressure/meter-of-mercury-column"/> </owl:NamedIndividual> <owl:NamedIndividualrdf:about="http://es-01.chibi.ubc.ca/~soroush/framingham/sbpfeb.owl#pressureinstance2"> <rdf:typerdf:resource="&galen;SystolicBloodPressure"/> <resource:SIO_000300>12.45</resource:SIO_000300> <resource:SIO_000221 rdf:resource="http://es-01.chibi.ubc.ca/~soroush/framingham/sbpfeb.owl#centi-meter-of-mercury-column"/> </owl:NamedIndividual> <owl:NamedIndividualrdf:about="http://es-01.chibi.ubc.ca/~soroush/framingham/sbpfeb.owl#pressureinstance3"> <rdf:typerdf:resource="&galen;SystolicBloodPressure"/> <resource:SIO_000300>5.3</resource:SIO_000300> <resource:SIO_000221 rdf:resource="&ucum;unit/pressure/inch-of-mercury-column"/> </owl:NamedIndividual>
  • 80. <owl:NamedIndividualrdf:about="http://es-01.chibi.ubc.ca/~soroush/framingham/sbpfeb.owl#pressureinstance1"> <rdf:typerdf:resource="&galen;SystolicBloodPressure"/> <resource:SIO_000300>0.137</resource:SIO_000300> <resource:SIO_000221 rdf:resource="&ucum;unit/pressure/meter-of-mercury-column"/> </owl:NamedIndividual> <owl:NamedIndividualrdf:about="http://es-01.chibi.ubc.ca/~soroush/framingham/sbpfeb.owl#pressureinstance2"> <rdf:typerdf:resource="&galen;SystolicBloodPressure"/> <resource:SIO_000300>12.45</resource:SIO_000300> <resource:SIO_000221 rdf:resource=“&ucum;unit/framingham/sbpfeb.owl#centi-meter-of-mercury-column"/> </owl:NamedIndividual> <owl:NamedIndividualrdf:about="http://es-01.chibi.ubc.ca/~soroush/framingham/sbpfeb.owl#pressureinstance3"> <rdf:typerdf:resource="&galen;SystolicBloodPressure"/> <resource:SIO_000300>5.3</resource:SIO_000300> <resource:SIO_000221 rdf:resource="&ucum;unit/pressure/inch-of-mercury-column"/> </owl:NamedIndividual>
  • 81. Example #2 From a (contrived) integrated dataset, retrieve the blood pressure measurements Semantics SELECT ?output ?unit ?value FROM <http://es-01.chibi.ubc.ca/~soroush/framingham/sbpfeb.owl> WHERE { ?output rdf:typesbp:BloodPressure . ?output local:hasCanonicalAttribute ?pr . ?pr sio:SIO_000221 ?unit . ?pr sio:SIO_000300 ?value . } My semantic definition of “Blood Pressure”includes the units that I want...
  • 82. Semantics This is enough to trigger SHARE to automatically discover an online unit-conversion service...
  • 83.
  • 84. Neither SADI nor SHAREknow anything aboutunits or unit conversions
  • 85. Many of the challenges to data aggregation and sharingnow have solutions that work!
  • 86. What, in my opinion, is the greatest remaining challenge?
  • 87. To a biologist... ...“data mining” means “this data is mine!”
  • 88. The challenge to us all Move from Data Mine-ing To Data Ours-ing -- Len Silverston, 2007
  • 89. XML We’ve come a long way!! XML Schema There will be an element called “qualifier” It will have an attribute called “name” The content of that attribute will betext There will be a child attribute called “value” The content of that child attribute will be free-text XML Schema There will be an element called “GBQualifier” There will be a child attribute called “GBQualifier_name” The content of that child attribute will be free-text There will be a child attribute called “GBQualifier_value” The content of that child attribute will be free-text
  • 90. TEAM: Luke McCarthyBenjamin Vandervalk SoroushSamadian Microsoft Research