SlideShare une entreprise Scribd logo
1  sur  21
A Generic Scientific Data Model
and Ontology for Representation
of Chemical Data
Stuart J. Chalk, Department of Chemistry
University of North Florida
schalk@unf.edu
CINF Paper 171 – 251st ACS Meeting Spring 2016
#ACSCINFDataSummit
Scientific Data Should be Open
 Simple: Openness as the norm not the exception
 Data made available, without restriction, so its useful
 Mechanisms/tools to make data available
 Formats to allow others to get the data…
 …but also so its easy to use
 Annotate the data to make it easy to find
 Community driven promotion of and action on this issue
 Research Notebook
 Spectral Files (JCAMP-DX, propriety)
 Excel Spreadsheets
 Personal Databases
 Online Databases
 PDF Files No!
 RDF Yes!
Resource Description Framework
Options for Storing Data?
 W3C Recommendation 2015
Specification - https://www.w3.org/TR/ldp/
Primer - https://www.w3.org/TR/ldp-primer/
The Linked Data
Platform
From: http://www.dataversity.net/introduction-linked-data-platform/
 Use JavaScript
Object Notation
(JSON) as a text
format for
storing data and
metadata so it
can be converted
to RDF
JSON for Linked Data (JSON-LD)
{
"@context": {
"name": "http://schema.org/name",
"isAlive": "http://example.org/isAlive",
"age": "http://example.org/age",
"height": "http://schema.org/height",
"@base": "http://www.unf.edu/chemistry/stuart_chalk.aspx"
},
"@id": "",
"name": "Stuart Chalk",
"isAlive": true,
"age": 49,
"height": 188.0
} http://json-ld.org/playground/
JSON for Linked Data (JSON-LD)
<http://www.unf.edu/chemistry/stuart_chalk.aspx>
<http://example.org/age>
"49"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://www.unf.edu/chemistry/stuart_chalk.aspx>
<http://example.org/isAlive>
"true"^^<http://www.w3.org/2001/XMLSchema#boolean> .
<http://www.unf.edu/chemistry/stuart_chalk.aspx>
<http://schema.org/height>
"188"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://www.unf.edu/chemistry/stuart_chalk.aspx>
<http://schema.org/name>
"Stuart Chalk" .
 Nice idea but because anything can be
linked to anything else to form a graph of variable structure…
 ...difficult to search, hard to maintain
 OK, use regular relational database – Rigid Schema
Not good to try and make data fit the schema…
 Use a hybrid approach!
 Encode some structure in RDF using a framework...
 ...add data to the structured graph in an organized way
Store all Scientific Data in RDF?
 Consider FAIR Principals (http://www.datafairport.org)
 To be Findable:
 F1. (meta)data are assigned a globally unique and persistent identifier
 F2. data are described with rich metadata (defined by R1 below)
 F3. metadata clearly and explicitly include the identifier of the data it describes
 F4. (meta)data are registered or indexed in a searchable resource
 To be Accessible:
 A1. (meta)data are retrievable by their identifier using a standardized communications protocol
 A2. metadata are accessible, even when the data are no longer available
 To be Interoperable:
 I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
 I2. (meta)data use vocabularies that follow FAIR principles
 I3. (meta)data include qualified references to other (meta)data
 To be Reusable:
 R1. meta(data) are richly described with a plurality of accurate and relevant attributes
 R1.1. (meta)data are released with a clear and accessible data usage license
 R1.2. (meta)data are associated with detailed provenance
 R1.3. (meta)data meet domain-relevant community standards
What Metadata is Important for Data?
 Define scope as data obtained from an experiment,
a series of experiments, a project
 Who did the work and where are they?
 Metadata about the data “packet”
 The raw data…
 …its associated metadata (enough to properly contextualize the data)
 Access rights
 Published location
What Should a Data Model Represent?
General
Framework
 SciData – Scientific Data
Model (SDM)
 Overview –
http://stuchalk.github.io/scidata/
 GitHub Repo –
https://github.com/stuchalk/scidata
General Framework
- The Context
 “@context” contains the
context definition
 Refers to other context files
 Namespace abbreviations
 Default vocabulary “@vocab”
 “@id” links ontology term
 “@type” states data type
Methodology, System, and Dataset
Example Data - pH
Example Data -
Literature Value
 “scope” provides internal link
to “@id” value
 Each value of a name value pair
has a default data type that can
be override by expanding value
to a JSON object and adding
“@value” and “@type”
Example Data -
NMR Spectrum
 “dataseries” are JSON arrays of
data on one axis
 Bring them together with
“datagroup” and we can
represent at spectrum
 “parameter” is generic
container for data, or metadata
Example Data –
CC Calculation
 “datagroup”s are structures to
aggregate data at any level
 “datagroup”s can be infinitely
nested
 “uid” is optional and can be
used to unique define any piece
of data
The SDM
Ontology
 SciData Ontology –
Scientific Data Model
Ontology (SDMO)
 OWL File –
https://github.com/stuchalk/scidata/b
lob/master/ontology/scidata.owl
 Get community feedback, refine/extend/standardize
 Generate large corpus of disparate data in JSON-LD, ingest into triple store
and query (SPARQL)
 Evaluate inferencing on the triple store data
 Push adoption through collaboration
 Run hackathons to build developer implementations
 Develop Electronic Laboratory Notebook (ELN) to generate data in JSON-LD
 Get feedback from data community, RDA - https://rd-alliance.org/
 Test using the NDS - http://www.nationaldataservice.org/
Future Work
 Pain Points
 Challenges
 Opportunities
 Normalization
 Tools to generate
metadata automatically
 User Perspective
 Gaps in Data
 Gaps in Ontology Coverage
Pain Points?
 Gather stakeholders to work on standards
 Broad knowledge domain representation
 i-UPAC, RDA Chemistry Research Data IG
 Priorities?
 Data annotation and representation
 Data exchange (repo <-> repo, user <-> user)
 Structure representation (chiral centers)
 Curation infrastructures
 Domain vocabulary translations
 Units of measure
Reality Check
“to err is human; to forgive, divine”
Alexander Pope
“to err is human; to really screw things up requires a computer”
Paul Ehrlich
“to err is human; all hell will break loose if you
don’t provide accurate semantics to a computer”
Stuart Chalk
 schalk@unf.edu
 Phone: 904-620-1938
 Skype: stuartchalk
 LinkedIn/Slidehare: https://www.linkedin.com/in/stuchalk
 ORCID: http://orcid.org/0000-0002-0703-7776
 ResearcherID: http://www.researcherid.com/rid/D-8577-2013
Questions?

Contenu connexe

Tendances

Fact less fact Tables & Aggregate Tables
Fact less fact Tables & Aggregate Tables Fact less fact Tables & Aggregate Tables
Fact less fact Tables & Aggregate Tables Sunita Sahu
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphIoan Toma
 
NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISrathnaarul
 
Visualization For Data Science
Visualization For Data ScienceVisualization For Data Science
Visualization For Data ScienceAngela Zoss
 
Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...
Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...
Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...Neo4j
 
Power bi introduction
Power bi introductionPower bi introduction
Power bi introductionBishwadeb Dey
 
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...Simplilearn
 
Big data visualization
Big data visualizationBig data visualization
Big data visualizationAnurag Gupta
 
Real-World Data Governance: What is a Data Steward and What Do They Do?
Real-World Data Governance: What is a Data Steward and What Do They Do?Real-World Data Governance: What is a Data Steward and What Do They Do?
Real-World Data Governance: What is a Data Steward and What Do They Do?DATAVERSITY
 
Microsoft power bi
Microsoft power biMicrosoft power bi
Microsoft power bitechpro360
 
Modeling Manufacturing With Graph Databases: A Journey Towards a Digital Factory
Modeling Manufacturing With Graph Databases: A Journey Towards a Digital FactoryModeling Manufacturing With Graph Databases: A Journey Towards a Digital Factory
Modeling Manufacturing With Graph Databases: A Journey Towards a Digital FactoryNeo4j
 
Graphs for Ai and ML
Graphs for Ai and MLGraphs for Ai and ML
Graphs for Ai and MLNeo4j
 
Data Visualization Design Best Practices Workshop
Data Visualization Design Best Practices WorkshopData Visualization Design Best Practices Workshop
Data Visualization Design Best Practices WorkshopJSI
 

Tendances (20)

Fact less fact Tables & Aggregate Tables
Fact less fact Tables & Aggregate Tables Fact less fact Tables & Aggregate Tables
Fact less fact Tables & Aggregate Tables
 
Power BI visuals
Power BI visualsPower BI visuals
Power BI visuals
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge Graph
 
NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSIS
 
Visualization For Data Science
Visualization For Data ScienceVisualization For Data Science
Visualization For Data Science
 
Excel to Power BI
Excel to Power BIExcel to Power BI
Excel to Power BI
 
Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...
Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...
Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...
 
Power bi introduction
Power bi introductionPower bi introduction
Power bi introduction
 
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
 
Power BI Overview
Power BI Overview Power BI Overview
Power BI Overview
 
Big data visualization
Big data visualizationBig data visualization
Big data visualization
 
Data analysis
Data analysisData analysis
Data analysis
 
Final PPT Imdb
Final PPT ImdbFinal PPT Imdb
Final PPT Imdb
 
Real-World Data Governance: What is a Data Steward and What Do They Do?
Real-World Data Governance: What is a Data Steward and What Do They Do?Real-World Data Governance: What is a Data Steward and What Do They Do?
Real-World Data Governance: What is a Data Steward and What Do They Do?
 
Microsoft power bi
Microsoft power biMicrosoft power bi
Microsoft power bi
 
Modeling Manufacturing With Graph Databases: A Journey Towards a Digital Factory
Modeling Manufacturing With Graph Databases: A Journey Towards a Digital FactoryModeling Manufacturing With Graph Databases: A Journey Towards a Digital Factory
Modeling Manufacturing With Graph Databases: A Journey Towards a Digital Factory
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
 
Graphs for Ai and ML
Graphs for Ai and MLGraphs for Ai and ML
Graphs for Ai and ML
 
Power bi overview
Power bi overview Power bi overview
Power bi overview
 
Data Visualization Design Best Practices Workshop
Data Visualization Design Best Practices WorkshopData Visualization Design Best Practices Workshop
Data Visualization Design Best Practices Workshop
 

Similaire à A Generic Scientific Data Model and Ontology for Representation of Chemical Data

Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Stuart Chalk
 
Next-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information RetrievalNext-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information RetrievalWaqas Tariq
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceCarole Goble
 
Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)Enayat Rajabi
 
FAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsFAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsTom Plasterer
 
FAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeFAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeTom Plasterer
 
David Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published recordDavid Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published recordJisc
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsCarole Goble
 
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Laurent Alquier
 
Dataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabulariesDataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabulariesValeria Pesce
 
Data Publishing Workflows with Dataverse
Data Publishing Workflows with DataverseData Publishing Workflows with Dataverse
Data Publishing Workflows with DataverseMicah Altman
 
FAIR Ddata in trustworthy repositories: the basics
FAIR Ddata in trustworthy repositories: the basicsFAIR Ddata in trustworthy repositories: the basics
FAIR Ddata in trustworthy repositories: the basicsOpenAIRE
 
Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Carole Goble
 
eTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service PlatformeTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service Platformibemam
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsVivien Bonazzi
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformSanjay Padhi, Ph.D
 

Similaire à A Generic Scientific Data Model and Ontology for Representation of Chemical Data (20)

Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
 
Next-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information RetrievalNext-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information Retrieval
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
 
Introduction of Linked Data for Science
Introduction of Linked Data for ScienceIntroduction of Linked Data for Science
Introduction of Linked Data for Science
 
Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)
 
FAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsFAIR Data Knowledge Graphs
FAIR Data Knowledge Graphs
 
FAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeFAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to Practice
 
David Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published recordDavid Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published record
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital Objects
 
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
 
Dataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabulariesDataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabularies
 
Data Publishing Workflows with Dataverse
Data Publishing Workflows with DataverseData Publishing Workflows with Dataverse
Data Publishing Workflows with Dataverse
 
When is a model FAIR – and why should we care?
When is a model FAIR – and why should we care?When is a model FAIR – and why should we care?
When is a model FAIR – and why should we care?
 
FAIR Ddata in trustworthy repositories: the basics
FAIR Ddata in trustworthy repositories: the basicsFAIR Ddata in trustworthy repositories: the basics
FAIR Ddata in trustworthy repositories: the basics
 
FAIR data
FAIR dataFAIR data
FAIR data
 
Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!
 
eTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service PlatformeTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service Platform
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data Commons
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
 
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at ScaleFull Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
 

Plus de Stuart Chalk

Semantic properties and units
Semantic properties and unitsSemantic properties and units
Semantic properties and unitsStuart Chalk
 
Open semantic chemical structures
Open semantic chemical structuresOpen semantic chemical structures
Open semantic chemical structuresStuart Chalk
 
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...Stuart Chalk
 
AnIML: A New Analytical Data Standard
AnIML: A New Analytical Data StandardAnIML: A New Analytical Data Standard
AnIML: A New Analytical Data StandardStuart Chalk
 
Scientific Units in the Electronic Age
Scientific Units in the Electronic AgeScientific Units in the Electronic Age
Scientific Units in the Electronic AgeStuart Chalk
 
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...Stuart Chalk
 
The Electronic Notebook Ontology
The Electronic Notebook OntologyThe Electronic Notebook Ontology
The Electronic Notebook OntologyStuart Chalk
 
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series DataSharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series DataStuart Chalk
 
Bringing Flow injection Analysis to the Semantic Web
Bringing Flow injection Analysis to the Semantic WebBringing Flow injection Analysis to the Semantic Web
Bringing Flow injection Analysis to the Semantic WebStuart Chalk
 
Reactions to the Open Spectral Database
Reactions to the Open Spectral DatabaseReactions to the Open Spectral Database
Reactions to the Open Spectral DatabaseStuart Chalk
 
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015Stuart Chalk
 
Building a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP ProjectBuilding a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP ProjectStuart Chalk
 
A Standard Data Format for Computational Chemistry: CSX
A Standard Data Format for Computational Chemistry: CSXA Standard Data Format for Computational Chemistry: CSX
A Standard Data Format for Computational Chemistry: CSXStuart Chalk
 
Overview of the Analytical Information Markup Language (AnIML)
Overview of the Analytical Information Markup Language (AnIML)Overview of the Analytical Information Markup Language (AnIML)
Overview of the Analytical Information Markup Language (AnIML)Stuart Chalk
 
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into EurekaACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into EurekaStuart Chalk
 
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka IntegrationACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka IntegrationStuart Chalk
 
ACS 248th Paper 108 NIST-IUPAC Solubility Data
ACS 248th Paper 108 NIST-IUPAC Solubility DataACS 248th Paper 108 NIST-IUPAC Solubility Data
ACS 248th Paper 108 NIST-IUPAC Solubility DataStuart Chalk
 
ACS 248th Paper 104 ChemData Project
ACS 248th Paper 104 ChemData ProjectACS 248th Paper 104 ChemData Project
ACS 248th Paper 104 ChemData ProjectStuart Chalk
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectStuart Chalk
 
ACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka CollaborationACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka CollaborationStuart Chalk
 

Plus de Stuart Chalk (20)

Semantic properties and units
Semantic properties and unitsSemantic properties and units
Semantic properties and units
 
Open semantic chemical structures
Open semantic chemical structuresOpen semantic chemical structures
Open semantic chemical structures
 
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...
 
AnIML: A New Analytical Data Standard
AnIML: A New Analytical Data StandardAnIML: A New Analytical Data Standard
AnIML: A New Analytical Data Standard
 
Scientific Units in the Electronic Age
Scientific Units in the Electronic AgeScientific Units in the Electronic Age
Scientific Units in the Electronic Age
 
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
 
The Electronic Notebook Ontology
The Electronic Notebook OntologyThe Electronic Notebook Ontology
The Electronic Notebook Ontology
 
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series DataSharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
 
Bringing Flow injection Analysis to the Semantic Web
Bringing Flow injection Analysis to the Semantic WebBringing Flow injection Analysis to the Semantic Web
Bringing Flow injection Analysis to the Semantic Web
 
Reactions to the Open Spectral Database
Reactions to the Open Spectral DatabaseReactions to the Open Spectral Database
Reactions to the Open Spectral Database
 
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
 
Building a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP ProjectBuilding a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP Project
 
A Standard Data Format for Computational Chemistry: CSX
A Standard Data Format for Computational Chemistry: CSXA Standard Data Format for Computational Chemistry: CSX
A Standard Data Format for Computational Chemistry: CSX
 
Overview of the Analytical Information Markup Language (AnIML)
Overview of the Analytical Information Markup Language (AnIML)Overview of the Analytical Information Markup Language (AnIML)
Overview of the Analytical Information Markup Language (AnIML)
 
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into EurekaACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
 
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka IntegrationACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
 
ACS 248th Paper 108 NIST-IUPAC Solubility Data
ACS 248th Paper 108 NIST-IUPAC Solubility DataACS 248th Paper 108 NIST-IUPAC Solubility Data
ACS 248th Paper 108 NIST-IUPAC Solubility Data
 
ACS 248th Paper 104 ChemData Project
ACS 248th Paper 104 ChemData ProjectACS 248th Paper 104 ChemData Project
ACS 248th Paper 104 ChemData Project
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP Project
 
ACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka CollaborationACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka Collaboration
 

Dernier

Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
Introduction to Viruses
Introduction to VirusesIntroduction to Viruses
Introduction to VirusesAreesha Ahmad
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Servicenishacall1
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxseri bangash
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Serviceshivanisharma5244
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Silpa
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Servicemonikaservice1
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformationAreesha Ahmad
 

Dernier (20)

Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Introduction to Viruses
Introduction to VirusesIntroduction to Viruses
Introduction to Viruses
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 

A Generic Scientific Data Model and Ontology for Representation of Chemical Data

  • 1. A Generic Scientific Data Model and Ontology for Representation of Chemical Data Stuart J. Chalk, Department of Chemistry University of North Florida schalk@unf.edu CINF Paper 171 – 251st ACS Meeting Spring 2016 #ACSCINFDataSummit
  • 2. Scientific Data Should be Open  Simple: Openness as the norm not the exception  Data made available, without restriction, so its useful  Mechanisms/tools to make data available  Formats to allow others to get the data…  …but also so its easy to use  Annotate the data to make it easy to find  Community driven promotion of and action on this issue
  • 3.  Research Notebook  Spectral Files (JCAMP-DX, propriety)  Excel Spreadsheets  Personal Databases  Online Databases  PDF Files No!  RDF Yes! Resource Description Framework Options for Storing Data?
  • 4.  W3C Recommendation 2015 Specification - https://www.w3.org/TR/ldp/ Primer - https://www.w3.org/TR/ldp-primer/ The Linked Data Platform From: http://www.dataversity.net/introduction-linked-data-platform/
  • 5.  Use JavaScript Object Notation (JSON) as a text format for storing data and metadata so it can be converted to RDF JSON for Linked Data (JSON-LD) { "@context": { "name": "http://schema.org/name", "isAlive": "http://example.org/isAlive", "age": "http://example.org/age", "height": "http://schema.org/height", "@base": "http://www.unf.edu/chemistry/stuart_chalk.aspx" }, "@id": "", "name": "Stuart Chalk", "isAlive": true, "age": 49, "height": 188.0 } http://json-ld.org/playground/
  • 6. JSON for Linked Data (JSON-LD) <http://www.unf.edu/chemistry/stuart_chalk.aspx> <http://example.org/age> "49"^^<http://www.w3.org/2001/XMLSchema#integer> . <http://www.unf.edu/chemistry/stuart_chalk.aspx> <http://example.org/isAlive> "true"^^<http://www.w3.org/2001/XMLSchema#boolean> . <http://www.unf.edu/chemistry/stuart_chalk.aspx> <http://schema.org/height> "188"^^<http://www.w3.org/2001/XMLSchema#integer> . <http://www.unf.edu/chemistry/stuart_chalk.aspx> <http://schema.org/name> "Stuart Chalk" .
  • 7.  Nice idea but because anything can be linked to anything else to form a graph of variable structure…  ...difficult to search, hard to maintain  OK, use regular relational database – Rigid Schema Not good to try and make data fit the schema…  Use a hybrid approach!  Encode some structure in RDF using a framework...  ...add data to the structured graph in an organized way Store all Scientific Data in RDF?
  • 8.  Consider FAIR Principals (http://www.datafairport.org)  To be Findable:  F1. (meta)data are assigned a globally unique and persistent identifier  F2. data are described with rich metadata (defined by R1 below)  F3. metadata clearly and explicitly include the identifier of the data it describes  F4. (meta)data are registered or indexed in a searchable resource  To be Accessible:  A1. (meta)data are retrievable by their identifier using a standardized communications protocol  A2. metadata are accessible, even when the data are no longer available  To be Interoperable:  I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.  I2. (meta)data use vocabularies that follow FAIR principles  I3. (meta)data include qualified references to other (meta)data  To be Reusable:  R1. meta(data) are richly described with a plurality of accurate and relevant attributes  R1.1. (meta)data are released with a clear and accessible data usage license  R1.2. (meta)data are associated with detailed provenance  R1.3. (meta)data meet domain-relevant community standards What Metadata is Important for Data?
  • 9.  Define scope as data obtained from an experiment, a series of experiments, a project  Who did the work and where are they?  Metadata about the data “packet”  The raw data…  …its associated metadata (enough to properly contextualize the data)  Access rights  Published location What Should a Data Model Represent?
  • 10. General Framework  SciData – Scientific Data Model (SDM)  Overview – http://stuchalk.github.io/scidata/  GitHub Repo – https://github.com/stuchalk/scidata
  • 11. General Framework - The Context  “@context” contains the context definition  Refers to other context files  Namespace abbreviations  Default vocabulary “@vocab”  “@id” links ontology term  “@type” states data type
  • 14. Example Data - Literature Value  “scope” provides internal link to “@id” value  Each value of a name value pair has a default data type that can be override by expanding value to a JSON object and adding “@value” and “@type”
  • 15. Example Data - NMR Spectrum  “dataseries” are JSON arrays of data on one axis  Bring them together with “datagroup” and we can represent at spectrum  “parameter” is generic container for data, or metadata
  • 16. Example Data – CC Calculation  “datagroup”s are structures to aggregate data at any level  “datagroup”s can be infinitely nested  “uid” is optional and can be used to unique define any piece of data
  • 17. The SDM Ontology  SciData Ontology – Scientific Data Model Ontology (SDMO)  OWL File – https://github.com/stuchalk/scidata/b lob/master/ontology/scidata.owl
  • 18.  Get community feedback, refine/extend/standardize  Generate large corpus of disparate data in JSON-LD, ingest into triple store and query (SPARQL)  Evaluate inferencing on the triple store data  Push adoption through collaboration  Run hackathons to build developer implementations  Develop Electronic Laboratory Notebook (ELN) to generate data in JSON-LD  Get feedback from data community, RDA - https://rd-alliance.org/  Test using the NDS - http://www.nationaldataservice.org/ Future Work
  • 19.  Pain Points  Challenges  Opportunities  Normalization  Tools to generate metadata automatically  User Perspective  Gaps in Data  Gaps in Ontology Coverage Pain Points?  Gather stakeholders to work on standards  Broad knowledge domain representation  i-UPAC, RDA Chemistry Research Data IG  Priorities?  Data annotation and representation  Data exchange (repo <-> repo, user <-> user)  Structure representation (chiral centers)  Curation infrastructures  Domain vocabulary translations  Units of measure
  • 20. Reality Check “to err is human; to forgive, divine” Alexander Pope “to err is human; to really screw things up requires a computer” Paul Ehrlich “to err is human; all hell will break loose if you don’t provide accurate semantics to a computer” Stuart Chalk
  • 21.  schalk@unf.edu  Phone: 904-620-1938  Skype: stuartchalk  LinkedIn/Slidehare: https://www.linkedin.com/in/stuchalk  ORCID: http://orcid.org/0000-0002-0703-7776  ResearcherID: http://www.researcherid.com/rid/D-8577-2013 Questions?