SlideShare une entreprise Scribd logo
1  sur  36
Give an open access to your data
and make them ready to be mined
Daniel Jacob
UMR 1332 BFP – Metabolism Group
Bordeaux Metabolomics Facility
May 2016
Open Data for Access and Mining
A data explorer as bonus
EDTMS
ODAM
Daniel Jacob – INRA UMR 1332 –May 2016
The experimental context: needs / wishesseeding harvesting
samples
preparation
samples analysis
Sample
identifiers
2
Experiment
Data Tables
Experiment Design
Web API
Develop if needed, lightweight tools
- R scripts (Galaxy), lightweight GUI
(R shiny)
Make both metadata and data
available for data mining
identifiers centrally
managed
data sharing & data availability
facilitate the subsequent
data mining
1
2
3
EDTMS
ODAM Open Data for Access and Mining : The core idea in one shot
Daniel Jacob – INRA UMR 1332 –May 2016
Data repository
Data capture Minimal effort (PUT)
PUT
myhost.org
http://myhost.org/
mount
GET
Implementation of an
Experiment Data Tables Management System
(EDTMS)
Experiment
Data Tables
Merely dropping data files in a data
repository (e.g. a local NAS or distant
storage space) should allow users to
access them by web API
Data can be downloaded,
explored and mined
No database schema, no programming code and no additional configuration on the server side.
Open Data for Access and Mining : The core idea in one shot
EDTMS
ODAM
3
Daniel Jacob – INRA UMR 1332 –May 2016
plants.tsv
harvests.tsv
samples.tsv
compounds.tsv
Data subset files
enzymes.tsv
• Whatever the kind of experiment, this assumes a design of experiment
(DoE) involving individuals, samples or whatever things, as the main
objects of study (e.g. plants, tissues, bacteria, …)
• This also assumes the observation of dependent variables resulting of
effects of some controlled experimental factors.
• Moreover, the objects of study have usually an identifier for each of
them, and the variables can be quantitative or qualitative.
• We can have either one object type of study or several kinds, but in
this latter case, it must exist a relationship between object types that
we assume of “obtainedFrom" type.
Preparation and cleaning of the data sub-sets of files
EDTMS
ODAM
4
Daniel Jacob – INRA UMR 1332 –May 2016
plants.tsv
harvests.tsv
samples.tsv
compounds.tsv
Classification of each column within its right category
enzymes.tsv
Data subset files
factor
quantitative
qualitative
identifier
link
categories
EDTMS
ODAM
5
Data subsets files and their associated metadata files must be compliant
with the TSV standard (Tab-Separator-Values)
• You have to organize your data subsets so that links could be established between them.
• In practical, it means to add a column containing the identifiers corresponding to the entity
to which you want to connect the subset, implying a ‘obtainedFrom’ relation.
• It is to be noted that this duplication of identifiers must be the only redundant
information, through all data subsets.
Daniel Jacob – INRA UMR 1332 –May 2016
plants.tsv harvests.tsv
samples.tsv
enzymes.tsv
Data subset files
compounds.tsv
Plants
Harvests
Samples
Compounds
Enzymes
Connections between the dataset files based on identifiers
Entities
(concepts)
Link between 2 subsets being carried out from identifiers
(implies a ‘obtainedFrom’ relation)
Identifier of the central entity of the subset
EDTMS
ODAM
factor
quantitative
qualitative
identifier
link
categories
6
Daniel Jacob – INRA UMR 1332 –May 2016
Supplementary files
In order to allow data to be explored and mined, we have to adjoin some
minimal but relevant metadata:
For that, 2 metadata files are required
• s_subsets.tsv: a file allowing to associate with each subset of data a key
concept corresponding to the main entity of the subset and the relations
of the type "obtainedFrom" between these concepts
• a_attributes.tsv: a metadata file allowing each attribute
(concept/variable) to be annotated with some minimal but relevant
metadata
Creation of the metadata files
EDTMS
ODAM
7
Data subsets files and their associated metadata files must be compliant with the TSV standard (Tab-Separator-Values)Note:
TSV is an alternative to the common comma-separated values (CSV) format, which often causes difficulties because of the need to escape commas
Daniel Jacob – INRA UMR 1332 –May 2016
s_subsets.tsv This metadata file allows to associate a key concept to each data subset file
Creation of the metadata files
EDTMS
ODAM
8
Plants
Compounds
Enzymes
Harvests
Samples
plants.tsv
PlanteID
harvests.tsv
Lot samples.tsv
SampleID
compounds.tsv
enzymes.tsv
SampleID
SampleID
1
2
3
4
5
Identifier of the central entity of the subset
Link between 2 subsets (implies a ‘obtainedFrom’ relation)
Unique rank number of the data subset
Key concept (i.e. the main entity) associated to the subset in the form of a short name
Plants1
factor
quantitative
qualitative
identifier
categories
PlanteID plants.tsv
Data file name
Daniel Jacob – INRA UMR 1332 –May 2016
a_attributes.tsv This metadata file allows each attribute (variable) to be annotated with
some minimal but relevant metadata
Creation of the metadata files
EDTMS
ODAM
9
factor
quantitative
qualitative
identifier
categories
Plants
Harvests
Samples
Compounds
…
…
Daniel Jacob – INRA UMR 1332 –May 2016
s_subsets.tsv
a_attributes.tsv
…
…
Additional subsets/ attributes can be
added step by step, as soon as data
are produced.
Updating the metadata files
EDTMS
ODAM
Daniel Jacob – INRA UMR 1332 –May 2016
Uploading your datasets in the data repository
EDTMS
ODAM
No database schema, no programming code and no additional configuration on the server side.
Your data subset files
Your dataset entry (named
‘frim1’ as example) within
the data repository
Z: (Storage)
Merely dropping data files on the data repository (e.g. NAS) should allow
users to access them by web API
Data subsets files and their
associated metadata files must be
compliant with the TSV standard
(Tab-Separator-Values)
Data repository
PUT
myhost.orgmount
GET
Data capture
Minimal effort (PUT)
Daniel Jacob – INRA UMR 1332 –May 2016
http://myhost.org/check/frim1
myhost.org
StorageDataRepos
NAS
Checking online if your the data subset files are consistent
EDTMS
ODAM
Many test checks can
be automatically
done for you
Daniel Jacob – INRA UMR 1332 –May 2016
EDTMS
ODAM
Data storage
seeding
harvesting samples analysis
samples
preparation
13
GET
, maximal efficiency (GET)
After depositing your complete dataset as described previously:
• An open access is given to your data through web API
• They are ready to be mined
• No specific code or additional configuration are needed (*) https://www.erasysbio.net/index.php?index=266
minimal effort (PUT)
PUT
Format
TSV
Data
Data Linking
Preparation and cleaning of the data sub-sets of files
FRIM1(*)
Check
Open Data, Access and Mining : web API
Daniel Jacob – INRA UMR 1332 –May 2016
Data
Format
TSV
EDTMS
ODAM
Data linking
Open Data, Access and Mining : web API
REST Services: hierarchical tree of resource naming (URL)
Retrieving data
Retrieving metadata
<data format>
<dataset name>
<subset>
(<subset>)
<entry><category>
<value> <value> <value>
<entry>
GET http://myhost.org/getdata/<data format>/<dataset name>/< … >/< … >
factor
quantitative
qualitative
identifier
link
categories
FRIM1 (*)
xml/tsv/json
frim1
14
(*) https://doi.org/10.5281/zenodo.154041
Daniel Jacob – INRA UMR 1332 –May 2016
EDTMS
ODAM Open Data, Access and Mining : web API
REST Services: hierarchical tree of resource naming (URL)
15
GET http://myhost.org/getdata/<data format>/<dataset name>/< … >/< … >
Field Description Examples
<data format> format of the retrieved data; possible values are: 'xml' or 'csv' xml
<dataset name> Short name (tag) of your dataset frim1
<subset> Short name of a data subset samples
<entry> Name of an attribute entry (defined by the user in the a_attribute file
(column ‘entry’)
sampleid
<category> Name of the attribute category; (assigned by the user in the a_attribute file
(column ‘category’)
possible values are: ‘identifier’, ‘factor’, ‘qualitative’, ‘quantitative’
quantitative
(<subset>) Set of data subsets by merging all the subsets with lower rank than the
specified subset and following the pathway defined by the "is_part_of"
links.
(samples) 
plants + harvests
+ samples
<value> Exact value of the desired entry or category 1, factor
Daniel Jacob – INRA UMR 1332 –May 2016
EDTMS
ODAM Open Data, Access and Mining : web API
REST Services: hierarchical tree of resource naming (URL)
16
GET http://myhost.org/getdata/<data format>/<dataset name>/< … >/< … >
http://myhost.org/getdata/<data format>/<dataset name>/<subset>/<entry>/<value>
http://myhost.org/getdata/<data format>/<dataset name>/(<subset>)/<category>
http://myhost.org/getdata/<data format>/<dataset name>
http://myhost.org/getdata/<data format>/<dataset name>/(<subset>)/<entry>/<value>
http://myhost.org/getdata/<data format>/<dataset name>/<subset>
http://myhost.org/getdata/<data format>/<dataset name>/(<subset>)
• Get the subset list of a dataset
• Get all values within a data subset
• Get values within a data subset for a specific value of an entry
• Get all values within a set of data subsets
• Get values within a set of data subsets for a specific value of an entry
• Get the attribute list within a set of data subsets for a specific category
Daniel Jacob – INRA UMR 1332 –May 2016
http://myhost.org/getdata/xml/frim1 http://myhost.org/getdata/xml/frim1/plants
http://myhost.org/getdata/xml/frim1/harvests/lot/1
http://myhost.org/getdata/xml/frim1/(compounds)/quantitative
Metadata
Metadata
Data
Data
Open Data Access via web API: Examples based on FRIM1
EDTMS
ODAM
FRIM1
17
Daniel Jacob – INRA UMR 1332 –May 2016
http://myhost.org/getdata/xml/frim1/(samples)/treatment/Control
Set of data subsets by merging all the subsets with lower rank than the specified
subset and following the pathway defined by the “obtainedFrom" links.
(samples)  plants + harvests + samples
Open Data Access via web API: Examples based on FRIM1
EDTMS
ODAM
FRIM1
18
Daniel Jacob – INRA UMR 1332 –May 2016
Data
Format
TSV
minimal effort, maximal efficiency
EDTMS
ODAM
Data linking
Open Data Access via web API: Application layer
FRIM1
19
…
Use existing tools
- Spreadsheets, R studio,
BioStatFlow, Galaxy,
Cytoscape, …
Daniel Jacob – INRA UMR 1332 –May 2016
Retrieving Data within R
Open Data Access via web API: Application layer
The R package
Rodam
EDTMS
ODAM
20
Daniel Jacob – INRA UMR 1332 –May 2016
Open Data Access via web API Rodam package
21
<data format>
<dataset name>
<subset>
(<subset>)
<entry><category>
<value> <value> <value>
<entry>
tsv
frim1
samples
sample
365
GET http://www.bordeaux.inra.fr/pmb/getdata/tsv/frim1/(samples)/sample/365
Daniel Jacob – INRA UMR 1332 –May 2016
Open Data Access via web API
Read metadata
i.e. category types within the data
Get the data subset ‘activome’
along with its metadata
22
<data format>
<dataset name>
<subset>
(<subset>)
<entry>
<category>
<value>
<value>
<entry>
tsv
frim1
activome
factor
GET http://www.bordeaux.inra.fr/pmb/getdata/tsv/frim1/(activome)/factor
Rodam package
Daniel Jacob – INRA UMR 1332 –May 2016
Open Data Access via web API
23
Rodam package
Daniel Jacob – INRA UMR 1332 –May 2016
Data / Metadata
Data Mining
?
Make both
metadata and data
available for
data mining.
Experimentation
/ Analysis
MFA
rCCA
pLDA
…
Open Data Access via web API
activome qNMR_metabo
Water StressControl
ODAM facilitates the subsequent data mining
All Dev. Stages
All Treatments
ODAM facilitates the subsequent data mining
(log10 transformed)
24
Rodam package
Daniel Jacob – INRA UMR 1332 –May 2016
Develop if needed, lightweight tools
- R scripts (Galaxy), lightweight GUI (R shiny)
minimal effort, maximal efficiency
…
Use existing tools
- Spreadsheets, R studio,
BioStatFlow, Galaxy,
Cytoscape, …
EDTMS
ODAM
Data
Format
TSV
Data linking
Open Data Access via web API: Application layer
FRIM1
25
Daniel Jacob – INRA UMR 1332 –May 2016
FRIM - Fruit Integrative Modelling
EDTMS
ODAM
26
http://www.bordeaux.inra.fr/pmb/dataexplorer/?ds=frim1
Daniel Jacob – INRA UMR 1332 –May 2016
FRIM - Fruit Integrative Modelling
EDTMS
ODAM
27
http://www.bordeaux.inra.fr/pmb/dataexplorer/?ds=frim1
Daniel Jacob – INRA UMR 1332 –May 2016
FRIM - Fruit Integrative Modelling
EDTMS
ODAM
28
Daniel Jacob – INRA UMR 1332 –May 2016
FRIM - Fruit Integrative Modelling
EDTMS
ODAM
29
To remove an item
from the selection: i)
click on it, and then
ii) click on the
‘Suppr’ key
Daniel Jacob – INRA UMR 1332 –May 2016
FRIM - Fruit Integrative Modelling
EDTMS
ODAM
30
Daniel Jacob – INRA UMR 1332 –May 2016
FRIM - Fruit Integrative Modelling
EDTMS
ODAM
31
Explore several
possibilities by
interacting with
the graph
Daniel Jacob – INRA UMR 1332 –May 2016
To summarize
1. Preparation and cleaning of the data sub-sets of files
2. Classification of each column within its right category
3. Connections between the dataset files based on identifiers
4. Creation of the definition files namely s_subsets.tsv and a_attributes.tsv
5. Deposit of the dataset files in the data repository
6. Checking online if your the data subset files are consistent
7. Testing online the web-services on your dataset
8. Use of the web API through an application layer (R scripts, data explorer, ... )
EDTMS
ODAM
Data subsets files and their associated metadata files must be
compliant with the TSV standard (Tab-Separator-Values)
Note:
TSV is an alternative to the common comma-separated values (CSV) format, which often causes difficulties because of the need to escape commas
(See https://en.wikipedia.org/wiki/Tab-separated_values)
Daniel Jacob – INRA UMR 1332 –May 2016
Advantages of this approach
data sharing & data availability
- The array of the "plants" may be created even before planting the seeds.
- Similarly, the array of the "harvests" can be created as soon as the harvests are done,
and this before any analysis.
- Thus, these arrays are generated only once in the project and we can set up the
sharing soon the seed planting. Then each analysis comes to complement the set of
data as soon as they produce their own sub-dataset.
- data are accessible to everyone as soon as they are produced,
identifiers centrally managed
- data are archived and compiled, so that it becomes useless to proceed a laborious
investigation to find out who possesses the right identifiers, etc.
EDTMS
ODAM
seeding harvesting samples analysis
Sample
identifiers
samples
preparation
Daniel Jacob – INRA UMR 1332 –May 2016
Advantages of this approach
facilitate the subsequent publication of data
- data are already readily available online by web API,
- But nothing prevents to take this data to fill in existing databases, by adjoining more
elaborate annotations.
- Neither administrator privileges nor any programmatic skills are required
EDTMS
ODAM
Data
Format
TSV
Data linking
PUT
GET
Data capture
Minimal effortData analysis/mining
Maximum efficiency
Daniel Jacob – INRA UMR 1332 –May 2016
minimal effort, maximum efficiency
Format the data
- Based on TSV: choice to keep the good old way of scientist to use
worksheets, thus i) using the same tool for both data files and metadata
definition files, ii) no programmatic skill are required
Give an access through a web services layer
- based on current standards (REST)
Use existing tools
- Spreadsheets, R studio, BioStatFlow, Galaxy, Cytoscape, …
Develop if needed, lightweight tools
- R scripts, lightweight GUI (R shiny)
Advantages of this approach
biostatflow.org
EDTMS
ODAM
Daniel Jacob – INRA UMR 1332 –May 2016
Have a good fun !!
Daniel Jacob
UMR 1332 BFP – Metabolism Group
Bordeaux Metabolomics Facility
May 2016
Open Data for Access and Mining
https://hub.docker.com/r/odam/getdata/
http://www.bordeaux.inra.fr/pmb/dataexplorer/
https://github.com/INRA/ODAM
https://cran.r-project.org/package=Rodam
https://zenodo.org/record/154041
An online example

Contenu connexe

Tendances

Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
Phi Jack
 
Cluster2
Cluster2Cluster2
Cluster2
work
 
Data miningppt378
Data miningppt378Data miningppt378
Data miningppt378
nitttin
 
Data Mining Concepts and Techniques
Data Mining Concepts and TechniquesData Mining Concepts and Techniques
Data Mining Concepts and Techniques
Pratik Tambekar
 
Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 a
bhagathk
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
sumit621
 

Tendances (20)

Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
 
02 Data Mining
02 Data Mining02 Data Mining
02 Data Mining
 
Cluster2
Cluster2Cluster2
Cluster2
 
data mining and data warehousing
data mining and data warehousingdata mining and data warehousing
data mining and data warehousing
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Data miningppt378
Data miningppt378Data miningppt378
Data miningppt378
 
Data mining and its concepts
Data mining and its conceptsData mining and its concepts
Data mining and its concepts
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and Techniques
 
Data Mining Concepts and Techniques
Data Mining Concepts and TechniquesData Mining Concepts and Techniques
Data Mining Concepts and Techniques
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
 
Data Warehouse and Data Mining
Data Warehouse and Data MiningData Warehouse and Data Mining
Data Warehouse and Data Mining
 
Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 a
 
Chapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data MiningChapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data Mining
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
 
Database
DatabaseDatabase
Database
 
Data mining
Data miningData mining
Data mining
 
3 Data Mining Tasks
3  Data Mining Tasks3  Data Mining Tasks
3 Data Mining Tasks
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technology
 
Data mining
Data miningData mining
Data mining
 

En vedette

How I data mined my text message history
How I data mined my text message historyHow I data mined my text message history
How I data mined my text message history
Joe Cannatti Jr.
 
Data cube computation
Data cube computationData cube computation
Data cube computation
Rashmi Sheikh
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
Saif Ullah
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
smj
 

En vedette (16)

How I data mined my text message history
How I data mined my text message historyHow I data mined my text message history
How I data mined my text message history
 
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methodsData Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
Data visualization
Data visualizationData visualization
Data visualization
 
1.8 discretization
1.8 discretization1.8 discretization
1.8 discretization
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysis
 
Data cube computation
Data cube computationData cube computation
Data cube computation
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 
Data mining
Data miningData mining
Data mining
 

Similaire à Odam: Open Data, Access and Mining

DataFinder concepts and example: General (20100503)
DataFinder concepts and example: General (20100503)DataFinder concepts and example: General (20100503)
DataFinder concepts and example: General (20100503)
Data Finder
 
Jeff Grethe: CAMERA
Jeff Grethe: CAMERAJeff Grethe: CAMERA
Jeff Grethe: CAMERA
Iddo
 
eTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service PlatformeTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service Platform
ibemam
 
Big data & hadoop framework
Big data & hadoop frameworkBig data & hadoop framework
Big data & hadoop framework
Tu Pham
 

Similaire à Odam: Open Data, Access and Mining (20)

Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...
Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...
Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...
 
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
 
Make your data great now
Make your data great nowMake your data great now
Make your data great now
 
Environment Canada's Data Management Service
Environment Canada's Data Management ServiceEnvironment Canada's Data Management Service
Environment Canada's Data Management Service
 
Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...
Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...
Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...
 
Force11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, OxfordForce11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, Oxford
 
The CIARD RINGValeri
The CIARD RINGValeriThe CIARD RINGValeri
The CIARD RINGValeri
 
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataA Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
 
Make your data great again - Ver 2
Make your data great again - Ver 2Make your data great again - Ver 2
Make your data great again - Ver 2
 
Dataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabulariesDataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabularies
 
Dataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* DataDataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* Data
 
ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-­‐...
ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-­‐...ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-­‐...
ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-­‐...
 
DataFinder concepts and example: General (20100503)
DataFinder concepts and example: General (20100503)DataFinder concepts and example: General (20100503)
DataFinder concepts and example: General (20100503)
 
Jeff Grethe: CAMERA
Jeff Grethe: CAMERAJeff Grethe: CAMERA
Jeff Grethe: CAMERA
 
eTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service PlatformeTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service Platform
 
Big data & hadoop framework
Big data & hadoop frameworkBig data & hadoop framework
Big data & hadoop framework
 
A Look into the Apache OODT Ecosystem
A Look into the Apache OODT EcosystemA Look into the Apache OODT Ecosystem
A Look into the Apache OODT Ecosystem
 
Data Wrangling and Visualization Using Python
Data Wrangling and Visualization Using PythonData Wrangling and Visualization Using Python
Data Wrangling and Visualization Using Python
 
Metadata & brokering - a modern approach #2
Metadata & brokering - a modern approach #2Metadata & brokering - a modern approach #2
Metadata & brokering - a modern approach #2
 
Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFrames
 

Dernier

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 

Dernier (20)

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 

Odam: Open Data, Access and Mining

  • 1. Give an open access to your data and make them ready to be mined Daniel Jacob UMR 1332 BFP – Metabolism Group Bordeaux Metabolomics Facility May 2016 Open Data for Access and Mining A data explorer as bonus EDTMS ODAM
  • 2. Daniel Jacob – INRA UMR 1332 –May 2016 The experimental context: needs / wishesseeding harvesting samples preparation samples analysis Sample identifiers 2 Experiment Data Tables Experiment Design Web API Develop if needed, lightweight tools - R scripts (Galaxy), lightweight GUI (R shiny) Make both metadata and data available for data mining identifiers centrally managed data sharing & data availability facilitate the subsequent data mining 1 2 3 EDTMS ODAM Open Data for Access and Mining : The core idea in one shot
  • 3. Daniel Jacob – INRA UMR 1332 –May 2016 Data repository Data capture Minimal effort (PUT) PUT myhost.org http://myhost.org/ mount GET Implementation of an Experiment Data Tables Management System (EDTMS) Experiment Data Tables Merely dropping data files in a data repository (e.g. a local NAS or distant storage space) should allow users to access them by web API Data can be downloaded, explored and mined No database schema, no programming code and no additional configuration on the server side. Open Data for Access and Mining : The core idea in one shot EDTMS ODAM 3
  • 4. Daniel Jacob – INRA UMR 1332 –May 2016 plants.tsv harvests.tsv samples.tsv compounds.tsv Data subset files enzymes.tsv • Whatever the kind of experiment, this assumes a design of experiment (DoE) involving individuals, samples or whatever things, as the main objects of study (e.g. plants, tissues, bacteria, …) • This also assumes the observation of dependent variables resulting of effects of some controlled experimental factors. • Moreover, the objects of study have usually an identifier for each of them, and the variables can be quantitative or qualitative. • We can have either one object type of study or several kinds, but in this latter case, it must exist a relationship between object types that we assume of “obtainedFrom" type. Preparation and cleaning of the data sub-sets of files EDTMS ODAM 4
  • 5. Daniel Jacob – INRA UMR 1332 –May 2016 plants.tsv harvests.tsv samples.tsv compounds.tsv Classification of each column within its right category enzymes.tsv Data subset files factor quantitative qualitative identifier link categories EDTMS ODAM 5 Data subsets files and their associated metadata files must be compliant with the TSV standard (Tab-Separator-Values) • You have to organize your data subsets so that links could be established between them. • In practical, it means to add a column containing the identifiers corresponding to the entity to which you want to connect the subset, implying a ‘obtainedFrom’ relation. • It is to be noted that this duplication of identifiers must be the only redundant information, through all data subsets.
  • 6. Daniel Jacob – INRA UMR 1332 –May 2016 plants.tsv harvests.tsv samples.tsv enzymes.tsv Data subset files compounds.tsv Plants Harvests Samples Compounds Enzymes Connections between the dataset files based on identifiers Entities (concepts) Link between 2 subsets being carried out from identifiers (implies a ‘obtainedFrom’ relation) Identifier of the central entity of the subset EDTMS ODAM factor quantitative qualitative identifier link categories 6
  • 7. Daniel Jacob – INRA UMR 1332 –May 2016 Supplementary files In order to allow data to be explored and mined, we have to adjoin some minimal but relevant metadata: For that, 2 metadata files are required • s_subsets.tsv: a file allowing to associate with each subset of data a key concept corresponding to the main entity of the subset and the relations of the type "obtainedFrom" between these concepts • a_attributes.tsv: a metadata file allowing each attribute (concept/variable) to be annotated with some minimal but relevant metadata Creation of the metadata files EDTMS ODAM 7 Data subsets files and their associated metadata files must be compliant with the TSV standard (Tab-Separator-Values)Note: TSV is an alternative to the common comma-separated values (CSV) format, which often causes difficulties because of the need to escape commas
  • 8. Daniel Jacob – INRA UMR 1332 –May 2016 s_subsets.tsv This metadata file allows to associate a key concept to each data subset file Creation of the metadata files EDTMS ODAM 8 Plants Compounds Enzymes Harvests Samples plants.tsv PlanteID harvests.tsv Lot samples.tsv SampleID compounds.tsv enzymes.tsv SampleID SampleID 1 2 3 4 5 Identifier of the central entity of the subset Link between 2 subsets (implies a ‘obtainedFrom’ relation) Unique rank number of the data subset Key concept (i.e. the main entity) associated to the subset in the form of a short name Plants1 factor quantitative qualitative identifier categories PlanteID plants.tsv Data file name
  • 9. Daniel Jacob – INRA UMR 1332 –May 2016 a_attributes.tsv This metadata file allows each attribute (variable) to be annotated with some minimal but relevant metadata Creation of the metadata files EDTMS ODAM 9 factor quantitative qualitative identifier categories Plants Harvests Samples Compounds … …
  • 10. Daniel Jacob – INRA UMR 1332 –May 2016 s_subsets.tsv a_attributes.tsv … … Additional subsets/ attributes can be added step by step, as soon as data are produced. Updating the metadata files EDTMS ODAM
  • 11. Daniel Jacob – INRA UMR 1332 –May 2016 Uploading your datasets in the data repository EDTMS ODAM No database schema, no programming code and no additional configuration on the server side. Your data subset files Your dataset entry (named ‘frim1’ as example) within the data repository Z: (Storage) Merely dropping data files on the data repository (e.g. NAS) should allow users to access them by web API Data subsets files and their associated metadata files must be compliant with the TSV standard (Tab-Separator-Values) Data repository PUT myhost.orgmount GET Data capture Minimal effort (PUT)
  • 12. Daniel Jacob – INRA UMR 1332 –May 2016 http://myhost.org/check/frim1 myhost.org StorageDataRepos NAS Checking online if your the data subset files are consistent EDTMS ODAM Many test checks can be automatically done for you
  • 13. Daniel Jacob – INRA UMR 1332 –May 2016 EDTMS ODAM Data storage seeding harvesting samples analysis samples preparation 13 GET , maximal efficiency (GET) After depositing your complete dataset as described previously: • An open access is given to your data through web API • They are ready to be mined • No specific code or additional configuration are needed (*) https://www.erasysbio.net/index.php?index=266 minimal effort (PUT) PUT Format TSV Data Data Linking Preparation and cleaning of the data sub-sets of files FRIM1(*) Check Open Data, Access and Mining : web API
  • 14. Daniel Jacob – INRA UMR 1332 –May 2016 Data Format TSV EDTMS ODAM Data linking Open Data, Access and Mining : web API REST Services: hierarchical tree of resource naming (URL) Retrieving data Retrieving metadata <data format> <dataset name> <subset> (<subset>) <entry><category> <value> <value> <value> <entry> GET http://myhost.org/getdata/<data format>/<dataset name>/< … >/< … > factor quantitative qualitative identifier link categories FRIM1 (*) xml/tsv/json frim1 14 (*) https://doi.org/10.5281/zenodo.154041
  • 15. Daniel Jacob – INRA UMR 1332 –May 2016 EDTMS ODAM Open Data, Access and Mining : web API REST Services: hierarchical tree of resource naming (URL) 15 GET http://myhost.org/getdata/<data format>/<dataset name>/< … >/< … > Field Description Examples <data format> format of the retrieved data; possible values are: 'xml' or 'csv' xml <dataset name> Short name (tag) of your dataset frim1 <subset> Short name of a data subset samples <entry> Name of an attribute entry (defined by the user in the a_attribute file (column ‘entry’) sampleid <category> Name of the attribute category; (assigned by the user in the a_attribute file (column ‘category’) possible values are: ‘identifier’, ‘factor’, ‘qualitative’, ‘quantitative’ quantitative (<subset>) Set of data subsets by merging all the subsets with lower rank than the specified subset and following the pathway defined by the "is_part_of" links. (samples)  plants + harvests + samples <value> Exact value of the desired entry or category 1, factor
  • 16. Daniel Jacob – INRA UMR 1332 –May 2016 EDTMS ODAM Open Data, Access and Mining : web API REST Services: hierarchical tree of resource naming (URL) 16 GET http://myhost.org/getdata/<data format>/<dataset name>/< … >/< … > http://myhost.org/getdata/<data format>/<dataset name>/<subset>/<entry>/<value> http://myhost.org/getdata/<data format>/<dataset name>/(<subset>)/<category> http://myhost.org/getdata/<data format>/<dataset name> http://myhost.org/getdata/<data format>/<dataset name>/(<subset>)/<entry>/<value> http://myhost.org/getdata/<data format>/<dataset name>/<subset> http://myhost.org/getdata/<data format>/<dataset name>/(<subset>) • Get the subset list of a dataset • Get all values within a data subset • Get values within a data subset for a specific value of an entry • Get all values within a set of data subsets • Get values within a set of data subsets for a specific value of an entry • Get the attribute list within a set of data subsets for a specific category
  • 17. Daniel Jacob – INRA UMR 1332 –May 2016 http://myhost.org/getdata/xml/frim1 http://myhost.org/getdata/xml/frim1/plants http://myhost.org/getdata/xml/frim1/harvests/lot/1 http://myhost.org/getdata/xml/frim1/(compounds)/quantitative Metadata Metadata Data Data Open Data Access via web API: Examples based on FRIM1 EDTMS ODAM FRIM1 17
  • 18. Daniel Jacob – INRA UMR 1332 –May 2016 http://myhost.org/getdata/xml/frim1/(samples)/treatment/Control Set of data subsets by merging all the subsets with lower rank than the specified subset and following the pathway defined by the “obtainedFrom" links. (samples)  plants + harvests + samples Open Data Access via web API: Examples based on FRIM1 EDTMS ODAM FRIM1 18
  • 19. Daniel Jacob – INRA UMR 1332 –May 2016 Data Format TSV minimal effort, maximal efficiency EDTMS ODAM Data linking Open Data Access via web API: Application layer FRIM1 19 … Use existing tools - Spreadsheets, R studio, BioStatFlow, Galaxy, Cytoscape, …
  • 20. Daniel Jacob – INRA UMR 1332 –May 2016 Retrieving Data within R Open Data Access via web API: Application layer The R package Rodam EDTMS ODAM 20
  • 21. Daniel Jacob – INRA UMR 1332 –May 2016 Open Data Access via web API Rodam package 21 <data format> <dataset name> <subset> (<subset>) <entry><category> <value> <value> <value> <entry> tsv frim1 samples sample 365 GET http://www.bordeaux.inra.fr/pmb/getdata/tsv/frim1/(samples)/sample/365
  • 22. Daniel Jacob – INRA UMR 1332 –May 2016 Open Data Access via web API Read metadata i.e. category types within the data Get the data subset ‘activome’ along with its metadata 22 <data format> <dataset name> <subset> (<subset>) <entry> <category> <value> <value> <entry> tsv frim1 activome factor GET http://www.bordeaux.inra.fr/pmb/getdata/tsv/frim1/(activome)/factor Rodam package
  • 23. Daniel Jacob – INRA UMR 1332 –May 2016 Open Data Access via web API 23 Rodam package
  • 24. Daniel Jacob – INRA UMR 1332 –May 2016 Data / Metadata Data Mining ? Make both metadata and data available for data mining. Experimentation / Analysis MFA rCCA pLDA … Open Data Access via web API activome qNMR_metabo Water StressControl ODAM facilitates the subsequent data mining All Dev. Stages All Treatments ODAM facilitates the subsequent data mining (log10 transformed) 24 Rodam package
  • 25. Daniel Jacob – INRA UMR 1332 –May 2016 Develop if needed, lightweight tools - R scripts (Galaxy), lightweight GUI (R shiny) minimal effort, maximal efficiency … Use existing tools - Spreadsheets, R studio, BioStatFlow, Galaxy, Cytoscape, … EDTMS ODAM Data Format TSV Data linking Open Data Access via web API: Application layer FRIM1 25
  • 26. Daniel Jacob – INRA UMR 1332 –May 2016 FRIM - Fruit Integrative Modelling EDTMS ODAM 26 http://www.bordeaux.inra.fr/pmb/dataexplorer/?ds=frim1
  • 27. Daniel Jacob – INRA UMR 1332 –May 2016 FRIM - Fruit Integrative Modelling EDTMS ODAM 27 http://www.bordeaux.inra.fr/pmb/dataexplorer/?ds=frim1
  • 28. Daniel Jacob – INRA UMR 1332 –May 2016 FRIM - Fruit Integrative Modelling EDTMS ODAM 28
  • 29. Daniel Jacob – INRA UMR 1332 –May 2016 FRIM - Fruit Integrative Modelling EDTMS ODAM 29 To remove an item from the selection: i) click on it, and then ii) click on the ‘Suppr’ key
  • 30. Daniel Jacob – INRA UMR 1332 –May 2016 FRIM - Fruit Integrative Modelling EDTMS ODAM 30
  • 31. Daniel Jacob – INRA UMR 1332 –May 2016 FRIM - Fruit Integrative Modelling EDTMS ODAM 31 Explore several possibilities by interacting with the graph
  • 32. Daniel Jacob – INRA UMR 1332 –May 2016 To summarize 1. Preparation and cleaning of the data sub-sets of files 2. Classification of each column within its right category 3. Connections between the dataset files based on identifiers 4. Creation of the definition files namely s_subsets.tsv and a_attributes.tsv 5. Deposit of the dataset files in the data repository 6. Checking online if your the data subset files are consistent 7. Testing online the web-services on your dataset 8. Use of the web API through an application layer (R scripts, data explorer, ... ) EDTMS ODAM Data subsets files and their associated metadata files must be compliant with the TSV standard (Tab-Separator-Values) Note: TSV is an alternative to the common comma-separated values (CSV) format, which often causes difficulties because of the need to escape commas (See https://en.wikipedia.org/wiki/Tab-separated_values)
  • 33. Daniel Jacob – INRA UMR 1332 –May 2016 Advantages of this approach data sharing & data availability - The array of the "plants" may be created even before planting the seeds. - Similarly, the array of the "harvests" can be created as soon as the harvests are done, and this before any analysis. - Thus, these arrays are generated only once in the project and we can set up the sharing soon the seed planting. Then each analysis comes to complement the set of data as soon as they produce their own sub-dataset. - data are accessible to everyone as soon as they are produced, identifiers centrally managed - data are archived and compiled, so that it becomes useless to proceed a laborious investigation to find out who possesses the right identifiers, etc. EDTMS ODAM seeding harvesting samples analysis Sample identifiers samples preparation
  • 34. Daniel Jacob – INRA UMR 1332 –May 2016 Advantages of this approach facilitate the subsequent publication of data - data are already readily available online by web API, - But nothing prevents to take this data to fill in existing databases, by adjoining more elaborate annotations. - Neither administrator privileges nor any programmatic skills are required EDTMS ODAM Data Format TSV Data linking PUT GET Data capture Minimal effortData analysis/mining Maximum efficiency
  • 35. Daniel Jacob – INRA UMR 1332 –May 2016 minimal effort, maximum efficiency Format the data - Based on TSV: choice to keep the good old way of scientist to use worksheets, thus i) using the same tool for both data files and metadata definition files, ii) no programmatic skill are required Give an access through a web services layer - based on current standards (REST) Use existing tools - Spreadsheets, R studio, BioStatFlow, Galaxy, Cytoscape, … Develop if needed, lightweight tools - R scripts, lightweight GUI (R shiny) Advantages of this approach biostatflow.org EDTMS ODAM
  • 36. Daniel Jacob – INRA UMR 1332 –May 2016 Have a good fun !! Daniel Jacob UMR 1332 BFP – Metabolism Group Bordeaux Metabolomics Facility May 2016 Open Data for Access and Mining https://hub.docker.com/r/odam/getdata/ http://www.bordeaux.inra.fr/pmb/dataexplorer/ https://github.com/INRA/ODAM https://cran.r-project.org/package=Rodam https://zenodo.org/record/154041 An online example