SlideShare une entreprise Scribd logo
1  sur  16
Dataset Descriptions in
Open PHACTS and
W3C HCLS IG
Alasdair J G Gray
Heriot-Watt University
www.alasdairjggray.co.uk A.J.G.Gray@hw.ac.uk
NDEx Call, April 2014
Nanopub
Db
VoID
Data Cache
(Virtuoso Triple Store)
Semantic Workflow Engine
Linked Data API (RDF/XML, TTL, JSON)
Domain
Specific
Services
Identity
Resolution
Service
Chemistry
Registration
Normalisation
& Q/C
Identifier
Management
Service
Indexing
CorePlatform
P12374
EC2.43.4
CS4532
“Adenosine
receptor 2a”
VoID
Db
Nanopub
Db
VoID
Db
VoID
Nanopub
VoID
Public Content Commercial
Public
Ontologies
User
Annotations
Apps
Data Cache
(Triple Store)
Semantic Workflow Engine
Linked Data API (RDF/XML, TTL, JSON)
Domain
Specific
Services
Identity
Resolution
Service
Identifier
Management
Service
CorePlatform
P12374
EC2.43.4
CS4532
“Adenosine
receptor 2a”
ChEMBL-
RDF
ChEMBL
Apps
Chem2Bio2
RDF
SD
v13v12
v2 or v8
ChemSpider
• Data aggregator: over 400 sources
– What data does it contain?
– What version of ?? did they load?
– When are new versions loaded?
• OPS data covers
– ChEBI
– ChEMBL
– DrugBank
2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 5
Metadata Challenges
• Datasets available
– In many versions over time
– In different formats
– From many mirrors/registries
• Datasets build on each other
• Files do not carry metadata
• Registries
– Can be out-of-date
– Can contain conflicting information
2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 6
Users require
data
provenance!
2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 7
2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 8
Description Model
2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 9
Realisation of Dataset Descriptions
• Needs to be incorporated into data publishing
pipeline
• Hard for publishers to provide conformant
descriptions
– Datasets are complex
– Evolve over time
– Seen as yet another burden
2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 15
VoID Editor
2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 16
Validator
2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 17
W3C HCLS Group
HCLS Community Profile Model
2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 19
Future Vision
Metadata: Write once, use many times
• Provide rich and accurate provenance trail of
data
– Automatic pipeline from VoID file to registries
• Align Open PHACTS with W3C HCLS
– Update tools for HCLS profile
2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 20
A.J.G.Gray@hw.ac.uk
www.alasdairjggray.co.uk
www.openphacts.org

Contenu connexe

Tendances

Seamless access to the world’s open access research papers via ResourceSync
Seamless access to the world’s open access research papers via ResourceSyncSeamless access to the world’s open access research papers via ResourceSync
Seamless access to the world’s open access research papers via ResourceSyncpetrknoth
 
Rdf saturator
Rdf saturatorRdf saturator
Rdf saturatorINRIA-OAK
 
Diversity++2015 talk: R2R+BCO-DMO - Linked Oceanographic Datasets
Diversity++2015 talk: R2R+BCO-DMO - Linked Oceanographic DatasetsDiversity++2015 talk: R2R+BCO-DMO - Linked Oceanographic Datasets
Diversity++2015 talk: R2R+BCO-DMO - Linked Oceanographic DatasetsAdila Krisnadhi
 
Whowas: History of resources at APNIC
Whowas: History of resources at APNICWhowas: History of resources at APNIC
Whowas: History of resources at APNICAPNIC
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondRevolution Analytics
 
apidays LIVE Paris 2021 - GraphQL Today and Tomorrow by Uri Goldshtein, The G...
apidays LIVE Paris 2021 - GraphQL Today and Tomorrow by Uri Goldshtein, The G...apidays LIVE Paris 2021 - GraphQL Today and Tomorrow by Uri Goldshtein, The G...
apidays LIVE Paris 2021 - GraphQL Today and Tomorrow by Uri Goldshtein, The G...apidays
 
Analytics and Access to the UK web archive
Analytics and Access to the UK web archiveAnalytics and Access to the UK web archive
Analytics and Access to the UK web archiveLewis Crawford
 
Clustering Search to Navigate A Case Study of the Canadian World Wide Web as ...
Clustering Search to Navigate A Case Study of the Canadian World Wide Web as ...Clustering Search to Navigate A Case Study of the Canadian World Wide Web as ...
Clustering Search to Navigate A Case Study of the Canadian World Wide Web as ...Ian Milligan
 
4Science presentes: ORCiD API Tutorial
4Science presentes: ORCiD API Tutorial4Science presentes: ORCiD API Tutorial
4Science presentes: ORCiD API Tutorial4Science
 
DSpace-CRIS: new features and contribution to the DSpace mainstream
DSpace-CRIS: new features and contribution to the DSpace mainstreamDSpace-CRIS: new features and contribution to the DSpace mainstream
DSpace-CRIS: new features and contribution to the DSpace mainstreamAndrea Bollini
 
Semantically-Enabled Digital Investigations
Semantically-Enabled Digital InvestigationsSemantically-Enabled Digital Investigations
Semantically-Enabled Digital Investigationsinbroker
 
ICIC 2013 New Product Introductions Minesoft
ICIC 2013 New Product Introductions MinesoftICIC 2013 New Product Introductions Minesoft
ICIC 2013 New Product Introductions MinesoftDr. Haxel Consult
 
New Product Introductions - FIZ Karlsruhe
New Product Introductions - FIZ KarlsruheNew Product Introductions - FIZ Karlsruhe
New Product Introductions - FIZ KarlsruheDr. Haxel Consult
 
Implementing BigPetStore with Apache Flink
Implementing BigPetStore with Apache FlinkImplementing BigPetStore with Apache Flink
Implementing BigPetStore with Apache FlinkMárton Balassi
 
Exploring linked data in r
Exploring linked data in rExploring linked data in r
Exploring linked data in rDavid Sherlock
 
BBC News Labs at ISKO Conference, UCL, London - July 2013
BBC News Labs at ISKO Conference, UCL, London - July 2013BBC News Labs at ISKO Conference, UCL, London - July 2013
BBC News Labs at ISKO Conference, UCL, London - July 2013BBC News Labs
 

Tendances (20)

Seamless access to the world’s open access research papers via ResourceSync
Seamless access to the world’s open access research papers via ResourceSyncSeamless access to the world’s open access research papers via ResourceSync
Seamless access to the world’s open access research papers via ResourceSync
 
Research Plan 2014
Research Plan 2014Research Plan 2014
Research Plan 2014
 
Rdf saturator
Rdf saturatorRdf saturator
Rdf saturator
 
Diversity++2015 talk: R2R+BCO-DMO - Linked Oceanographic Datasets
Diversity++2015 talk: R2R+BCO-DMO - Linked Oceanographic DatasetsDiversity++2015 talk: R2R+BCO-DMO - Linked Oceanographic Datasets
Diversity++2015 talk: R2R+BCO-DMO - Linked Oceanographic Datasets
 
Whowas: History of resources at APNIC
Whowas: History of resources at APNICWhowas: History of resources at APNIC
Whowas: History of resources at APNIC
 
R reproducibility
R reproducibilityR reproducibility
R reproducibility
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per Second
 
apidays LIVE Paris 2021 - GraphQL Today and Tomorrow by Uri Goldshtein, The G...
apidays LIVE Paris 2021 - GraphQL Today and Tomorrow by Uri Goldshtein, The G...apidays LIVE Paris 2021 - GraphQL Today and Tomorrow by Uri Goldshtein, The G...
apidays LIVE Paris 2021 - GraphQL Today and Tomorrow by Uri Goldshtein, The G...
 
R Then and Now
R Then and NowR Then and Now
R Then and Now
 
Analytics and Access to the UK web archive
Analytics and Access to the UK web archiveAnalytics and Access to the UK web archive
Analytics and Access to the UK web archive
 
Clustering Search to Navigate A Case Study of the Canadian World Wide Web as ...
Clustering Search to Navigate A Case Study of the Canadian World Wide Web as ...Clustering Search to Navigate A Case Study of the Canadian World Wide Web as ...
Clustering Search to Navigate A Case Study of the Canadian World Wide Web as ...
 
4Science presentes: ORCiD API Tutorial
4Science presentes: ORCiD API Tutorial4Science presentes: ORCiD API Tutorial
4Science presentes: ORCiD API Tutorial
 
S3 VFD
S3 VFDS3 VFD
S3 VFD
 
DSpace-CRIS: new features and contribution to the DSpace mainstream
DSpace-CRIS: new features and contribution to the DSpace mainstreamDSpace-CRIS: new features and contribution to the DSpace mainstream
DSpace-CRIS: new features and contribution to the DSpace mainstream
 
Semantically-Enabled Digital Investigations
Semantically-Enabled Digital InvestigationsSemantically-Enabled Digital Investigations
Semantically-Enabled Digital Investigations
 
ICIC 2013 New Product Introductions Minesoft
ICIC 2013 New Product Introductions MinesoftICIC 2013 New Product Introductions Minesoft
ICIC 2013 New Product Introductions Minesoft
 
New Product Introductions - FIZ Karlsruhe
New Product Introductions - FIZ KarlsruheNew Product Introductions - FIZ Karlsruhe
New Product Introductions - FIZ Karlsruhe
 
Implementing BigPetStore with Apache Flink
Implementing BigPetStore with Apache FlinkImplementing BigPetStore with Apache Flink
Implementing BigPetStore with Apache Flink
 
Exploring linked data in r
Exploring linked data in rExploring linked data in r
Exploring linked data in r
 
BBC News Labs at ISKO Conference, UCL, London - July 2013
BBC News Labs at ISKO Conference, UCL, London - July 2013BBC News Labs at ISKO Conference, UCL, London - July 2013
BBC News Labs at ISKO Conference, UCL, London - July 2013
 

En vedette

Things to see in london
Things to see in londonThings to see in london
Things to see in londonlmazuelasg
 
Including Co-Referent URIs in a SPARQL Query
Including Co-Referent URIs in a SPARQL QueryIncluding Co-Referent URIs in a SPARQL Query
Including Co-Referent URIs in a SPARQL QueryAlasdair Gray
 
Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...Alasdair Gray
 
Data Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case StudyData Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case StudyAlasdair Gray
 
Sensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-beingSensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-beingAlasdair Gray
 
Incorporating Commercial and Private Data into an Open Linked Data Platform f...
Incorporating Commercial and Private Data into an Open Linked Data Platform f...Incorporating Commercial and Private Data into an Open Linked Data Platform f...
Incorporating Commercial and Private Data into an Open Linked Data Platform f...Alasdair Gray
 
Ed pronunciation
Ed pronunciationEd pronunciation
Ed pronunciationlmazuelasg
 
2013 01-14 ops-dataset_descriptions
2013 01-14 ops-dataset_descriptions2013 01-14 ops-dataset_descriptions
2013 01-14 ops-dataset_descriptionsAlasdair Gray
 
Data Science meets Linked Data
Data Science meets Linked DataData Science meets Linked Data
Data Science meets Linked DataAlasdair Gray
 
The HCLS Community Profile: Describing Datasets, Versions, and Distributions
The HCLS Community Profile: Describing Datasets, Versions, and DistributionsThe HCLS Community Profile: Describing Datasets, Versions, and Distributions
The HCLS Community Profile: Describing Datasets, Versions, and DistributionsAlasdair Gray
 
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...Alasdair Gray
 

En vedette (18)

Bota papa noel_foamy
Bota papa noel_foamyBota papa noel_foamy
Bota papa noel_foamy
 
SensorBench
SensorBenchSensorBench
SensorBench
 
Things to see in london
Things to see in londonThings to see in london
Things to see in london
 
Data Linkage
Data LinkageData Linkage
Data Linkage
 
Including Co-Referent URIs in a SPARQL Query
Including Co-Referent URIs in a SPARQL QueryIncluding Co-Referent URIs in a SPARQL Query
Including Co-Referent URIs in a SPARQL Query
 
Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...
 
Noti átomo
Noti átomoNoti átomo
Noti átomo
 
Data Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case StudyData Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case Study
 
Sensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-beingSensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-being
 
Incorporating Commercial and Private Data into an Open Linked Data Platform f...
Incorporating Commercial and Private Data into an Open Linked Data Platform f...Incorporating Commercial and Private Data into an Open Linked Data Platform f...
Incorporating Commercial and Private Data into an Open Linked Data Platform f...
 
Sistema glandular
Sistema glandularSistema glandular
Sistema glandular
 
Ed pronunciation
Ed pronunciationEd pronunciation
Ed pronunciation
 
2013 01-14 ops-dataset_descriptions
2013 01-14 ops-dataset_descriptions2013 01-14 ops-dataset_descriptions
2013 01-14 ops-dataset_descriptions
 
Bota navidad
Bota navidadBota navidad
Bota navidad
 
mit gclog
mit gclogmit gclog
mit gclog
 
Data Science meets Linked Data
Data Science meets Linked DataData Science meets Linked Data
Data Science meets Linked Data
 
The HCLS Community Profile: Describing Datasets, Versions, and Distributions
The HCLS Community Profile: Describing Datasets, Versions, and DistributionsThe HCLS Community Profile: Describing Datasets, Versions, and Distributions
The HCLS Community Profile: Describing Datasets, Versions, and Distributions
 
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
 

Similaire à Dataset Descriptions in Open PHACTS and HCLS

Arabidopsis Information Portal, Developer Workshop 2014, Introduction
Arabidopsis Information Portal, Developer Workshop 2014, IntroductionArabidopsis Information Portal, Developer Workshop 2014, Introduction
Arabidopsis Information Portal, Developer Workshop 2014, IntroductionJasonRafeMiller
 
The new CIARD RING , a machine-readable directory of datasets for agriculture
The new CIARD RING, a machine-readable directory of datasets for agricultureThe new CIARD RING, a machine-readable directory of datasets for agriculture
The new CIARD RING , a machine-readable directory of datasets for agricultureValeria Pesce
 
Interoperability is the key: repositories networks promoting the quality and ...
Interoperability is the key: repositories networks promoting the quality and ...Interoperability is the key: repositories networks promoting the quality and ...
Interoperability is the key: repositories networks promoting the quality and ...Pedro Príncipe
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataGiorgos Santipantakis
 
The CIARD RING , a global directory of datasets for agriculture, by Valeria P...
The CIARD RING, a global directory of datasets for agriculture, by Valeria P...The CIARD RING, a global directory of datasets for agriculture, by Valeria P...
The CIARD RING , a global directory of datasets for agriculture, by Valeria P...CIARD Movement
 
Tim Pugh-SPEDDEXES 2014
Tim Pugh-SPEDDEXES 2014Tim Pugh-SPEDDEXES 2014
Tim Pugh-SPEDDEXES 2014aceas13tern
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And VisualizationIvan Ermilov
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshSion Smith
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformSanjay Padhi, Ph.D
 
Wed roman tut_open_datapub
Wed roman tut_open_datapubWed roman tut_open_datapub
Wed roman tut_open_datapubeswcsummerschool
 
apidays LIVE Paris 2021 - Stargate.io, An OSS Api Layer for your Cassandra by...
apidays LIVE Paris 2021 - Stargate.io, An OSS Api Layer for your Cassandra by...apidays LIVE Paris 2021 - Stargate.io, An OSS Api Layer for your Cassandra by...
apidays LIVE Paris 2021 - Stargate.io, An OSS Api Layer for your Cassandra by...apidays
 
CPaaS.io Y1 Review Meeting - Holistic Data Management
CPaaS.io Y1 Review Meeting - Holistic Data ManagementCPaaS.io Y1 Review Meeting - Holistic Data Management
CPaaS.io Y1 Review Meeting - Holistic Data ManagementStephan Haller
 
Arabidopsis Information Portal overview from Plant Biology Europe 2014
Arabidopsis Information Portal overview from Plant Biology Europe 2014Arabidopsis Information Portal overview from Plant Biology Europe 2014
Arabidopsis Information Portal overview from Plant Biology Europe 2014Matthew Vaughn
 
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL EndpointsA BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL EndpointsEnrico Daga
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout Carole Goble
 
Datasets and GATE Evaluation Framework for Benchmarking Wikipedia Based NER S...
Datasets and GATE Evaluation Framework for Benchmarking Wikipedia Based NER S...Datasets and GATE Evaluation Framework for Benchmarking Wikipedia Based NER S...
Datasets and GATE Evaluation Framework for Benchmarking Wikipedia Based NER S...Milan Dojchinovski
 
BDE SC3.3 Workshop - BDE Platform: Technical overview
 BDE SC3.3 Workshop -  BDE Platform: Technical overview BDE SC3.3 Workshop -  BDE Platform: Technical overview
BDE SC3.3 Workshop - BDE Platform: Technical overviewBigData_Europe
 
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...Axel Reichwein
 
RDF Database-as-a-Service with S4
RDF Database-as-a-Service with S4RDF Database-as-a-Service with S4
RDF Database-as-a-Service with S4Marin Dimitrov
 

Similaire à Dataset Descriptions in Open PHACTS and HCLS (20)

Arabidopsis Information Portal, Developer Workshop 2014, Introduction
Arabidopsis Information Portal, Developer Workshop 2014, IntroductionArabidopsis Information Portal, Developer Workshop 2014, Introduction
Arabidopsis Information Portal, Developer Workshop 2014, Introduction
 
The new CIARD RING , a machine-readable directory of datasets for agriculture
The new CIARD RING, a machine-readable directory of datasets for agricultureThe new CIARD RING, a machine-readable directory of datasets for agriculture
The new CIARD RING , a machine-readable directory of datasets for agriculture
 
Interoperability is the key: repositories networks promoting the quality and ...
Interoperability is the key: repositories networks promoting the quality and ...Interoperability is the key: repositories networks promoting the quality and ...
Interoperability is the key: repositories networks promoting the quality and ...
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
STI Summit 2011 - Linked data-services-streams
STI Summit 2011 - Linked data-services-streamsSTI Summit 2011 - Linked data-services-streams
STI Summit 2011 - Linked data-services-streams
 
The CIARD RING , a global directory of datasets for agriculture, by Valeria P...
The CIARD RING, a global directory of datasets for agriculture, by Valeria P...The CIARD RING, a global directory of datasets for agriculture, by Valeria P...
The CIARD RING , a global directory of datasets for agriculture, by Valeria P...
 
Tim Pugh-SPEDDEXES 2014
Tim Pugh-SPEDDEXES 2014Tim Pugh-SPEDDEXES 2014
Tim Pugh-SPEDDEXES 2014
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And Visualization
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
 
Wed roman tut_open_datapub
Wed roman tut_open_datapubWed roman tut_open_datapub
Wed roman tut_open_datapub
 
apidays LIVE Paris 2021 - Stargate.io, An OSS Api Layer for your Cassandra by...
apidays LIVE Paris 2021 - Stargate.io, An OSS Api Layer for your Cassandra by...apidays LIVE Paris 2021 - Stargate.io, An OSS Api Layer for your Cassandra by...
apidays LIVE Paris 2021 - Stargate.io, An OSS Api Layer for your Cassandra by...
 
CPaaS.io Y1 Review Meeting - Holistic Data Management
CPaaS.io Y1 Review Meeting - Holistic Data ManagementCPaaS.io Y1 Review Meeting - Holistic Data Management
CPaaS.io Y1 Review Meeting - Holistic Data Management
 
Arabidopsis Information Portal overview from Plant Biology Europe 2014
Arabidopsis Information Portal overview from Plant Biology Europe 2014Arabidopsis Information Portal overview from Plant Biology Europe 2014
Arabidopsis Information Portal overview from Plant Biology Europe 2014
 
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL EndpointsA BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
 
Datasets and GATE Evaluation Framework for Benchmarking Wikipedia Based NER S...
Datasets and GATE Evaluation Framework for Benchmarking Wikipedia Based NER S...Datasets and GATE Evaluation Framework for Benchmarking Wikipedia Based NER S...
Datasets and GATE Evaluation Framework for Benchmarking Wikipedia Based NER S...
 
BDE SC3.3 Workshop - BDE Platform: Technical overview
 BDE SC3.3 Workshop -  BDE Platform: Technical overview BDE SC3.3 Workshop -  BDE Platform: Technical overview
BDE SC3.3 Workshop - BDE Platform: Technical overview
 
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
 
RDF Database-as-a-Service with S4
RDF Database-as-a-Service with S4RDF Database-as-a-Service with S4
RDF Database-as-a-Service with S4
 

Plus de Alasdair Gray

Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...Alasdair Gray
 
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...Alasdair Gray
 
An Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland ProjectAn Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland ProjectAlasdair Gray
 
Supporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life SciencesSupporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life SciencesAlasdair Gray
 
Validata: A tool for testing profile conformance
Validata: A tool for testing profile conformanceValidata: A tool for testing profile conformance
Validata: A tool for testing profile conformanceAlasdair Gray
 
Open PHACTS: The Data Today
Open PHACTS: The Data TodayOpen PHACTS: The Data Today
Open PHACTS: The Data TodayAlasdair Gray
 
Data Integration in a Big Data Context
Data Integration in a Big Data ContextData Integration in a Big Data Context
Data Integration in a Big Data ContextAlasdair Gray
 
Scientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry dataScientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry dataAlasdair Gray
 
Describing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community ProfileDescribing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community ProfileAlasdair Gray
 
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...Alasdair Gray
 
Computing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery DatasetsComputing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery DatasetsAlasdair Gray
 

Plus de Alasdair Gray (12)

Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
 
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
 
An Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland ProjectAn Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland Project
 
Supporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life SciencesSupporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life Sciences
 
Validata: A tool for testing profile conformance
Validata: A tool for testing profile conformanceValidata: A tool for testing profile conformance
Validata: A tool for testing profile conformance
 
Open PHACTS: The Data Today
Open PHACTS: The Data TodayOpen PHACTS: The Data Today
Open PHACTS: The Data Today
 
Project X
Project XProject X
Project X
 
Data Integration in a Big Data Context
Data Integration in a Big Data ContextData Integration in a Big Data Context
Data Integration in a Big Data Context
 
Scientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry dataScientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry data
 
Describing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community ProfileDescribing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community Profile
 
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
 
Computing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery DatasetsComputing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery Datasets
 

Dernier

REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxGood agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxSimeonChristian
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 

Dernier (20)

REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxGood agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 

Dataset Descriptions in Open PHACTS and HCLS

  • 1. Dataset Descriptions in Open PHACTS and W3C HCLS IG Alasdair J G Gray Heriot-Watt University www.alasdairjggray.co.uk A.J.G.Gray@hw.ac.uk NDEx Call, April 2014
  • 2. Nanopub Db VoID Data Cache (Virtuoso Triple Store) Semantic Workflow Engine Linked Data API (RDF/XML, TTL, JSON) Domain Specific Services Identity Resolution Service Chemistry Registration Normalisation & Q/C Identifier Management Service Indexing CorePlatform P12374 EC2.43.4 CS4532 “Adenosine receptor 2a” VoID Db Nanopub Db VoID Db VoID Nanopub VoID Public Content Commercial Public Ontologies User Annotations Apps
  • 3. Data Cache (Triple Store) Semantic Workflow Engine Linked Data API (RDF/XML, TTL, JSON) Domain Specific Services Identity Resolution Service Identifier Management Service CorePlatform P12374 EC2.43.4 CS4532 “Adenosine receptor 2a” ChEMBL- RDF ChEMBL Apps Chem2Bio2 RDF SD v13v12 v2 or v8
  • 4.
  • 5. ChemSpider • Data aggregator: over 400 sources – What data does it contain? – What version of ?? did they load? – When are new versions loaded? • OPS data covers – ChEBI – ChEMBL – DrugBank 2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 5
  • 6. Metadata Challenges • Datasets available – In many versions over time – In different formats – From many mirrors/registries • Datasets build on each other • Files do not carry metadata • Registries – Can be out-of-date – Can contain conflicting information 2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 6 Users require data provenance!
  • 7. 2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 7
  • 8. 2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 8
  • 9. Description Model 2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 9
  • 10. Realisation of Dataset Descriptions • Needs to be incorporated into data publishing pipeline • Hard for publishers to provide conformant descriptions – Datasets are complex – Evolve over time – Seen as yet another burden 2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 15
  • 11. VoID Editor 2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 16
  • 12. Validator 2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 17
  • 14. HCLS Community Profile Model 2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 19
  • 15. Future Vision Metadata: Write once, use many times • Provide rich and accurate provenance trail of data – Automatic pipeline from VoID file to registries • Align Open PHACTS with W3C HCLS – Update tools for HCLS profile 2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 20

Notes de l'éditeur

  1. Motivation from OPSChallengesOPS approachW3C HCLS work
  2. Reminder of current architecture
  3. ChemSpider: EBI SDF fileChEMBL 13Data Cache: Chem2Bio2RDF ChEMBL RDFFile downloaded May 2011Chem2Bio2RDF metadata webpages:ChEMBL 8File contents: ChEMBL 2Mapping Server: KasabiChEMBL RDF fileChEMBL 12
  4. Large number of datasets: differing update ratesdifferent characteristicsRequire automated process
  5. Specifies checklist of propertiesDrawers upon existing vocabulariesAims to be simple to use: extensive guidance notes
  6. Checklist and guidance notes – user friendlyMinimal, easy to follow modelDrawer upon existing vocabulariesRequired and optional properties
  7. Agent-entity-action model can be cumbersome for datasets; agent not always known beyond data provider, i.e. not individual.Extension requirement is by design
  8. Provide two tools to help
  9. Dataset description creatorGenerates outline description through web formAllows you to see generated content
  10. Given a dataset description, does it conform to the OPS guidelinesGenerates error (red) and warning (orange) reportsError for MUST propertiesWarning for SHOULD propertiesInformation for MAY properties
  11. Large community buy in – Including EBIBuilds on OPS document: Checklist and guidance notes!Wide range of use casesShould be finalised by end of May – not final URL
  12. Three tier model – More complexMore required properties (not shown)Richer metadata
  13. Open PHACTS: 28 partner9 Pharmaceuticals3 Biotechs1 Triplestore firm15 academic