SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
Better data through better curation
!
!
Susanna-Assunta Sansone, PhD!
!
@biosharing!
@isatools!
!
Publishing better science through better data, Open Research, Nature Publishing Group, 14 November 2014
Data Consultant,
Honorary Academic Editor
Associate Director,
Principal Investigator
Data Descriptor: two complementary components
Article or !
narrative component!
(PDF and HTML)!
!
!
!
Experimental metadata or!
structured component!
(in-house curated,
machine-readable format)!
Data Descriptor: two complementary components
Article or !
narrative component!
(PDF and HTML)!
!
!
!
Experimental metadata or!
structured component!
(in-house curated,
machine-readable format)!
Structured component enhances Methods & Data
“The Methods section should include detailed text describing
the methods and procedures used in the study and assay(s),
and the processing steps leading to the production of the
data files, including any computational analyses…..
….. The Data Records section should be used to explain
each data record associated with this work, including the
repository where this information is stored, and an overview of
the data files and their formats.”
Focus on the description of the experimental workflow
•  We need to report sufficient
information to reuse the dataset
•  We must strike a balance between
depth and breadth of information
Focus on the description of the experimental workflow
•  Not too much
•  Not too little
•  But just right
Structured component: key information from narrative
Seven week old C57BL/6N mice were treated
with low-fat diet.
Liver was dissected out, hepatocytes prepared…
Age value
Unit
Strain name
Subject of the experiment
Type of diet and
experimental condition
Anatomy part
Seven week old C57BL/6N mice were treated
with low-fat diet.
Liver was dissected out, hepatocytes prepared …
From natural language to ‘computable’ concepts
Age value
Unit
Strain name
Subject of the experiment
Type of diet and
experimental condition
Anatomy part
Seven week old C57BL/6N mice were treated
with low-fat diet.
Liver was dissected out, hepatocytes prepared …
From natural language to ‘computable’ concepts
Type of protocol – cell preparation
Type of protocol - sample treatment
Type of protocol – liver preparation
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta
Sansone www.ebi.ac.uk/net-project
1
0
Example of richly annotated, computable description
Credit to:
OBI consortium
And conversely….
LS1_C2_LD_TP2_P1! file1-fastq.gz!
…how not to report the experimental information!
•  L!S1 ! !liver sample 1!
•  C2 ! !compound 2!
•  LD ! !low dose!
•  TP2 ! !time point 2!
•  P1 ! !protocol 1!
•  file1-fastq.gz !compressed data file for sequence
! ! !information corresponding to this
! ! !sample!
Sample name (?!)! Data file!
LS1_C2_LD_TP2_P1! file1-fastq.gz!
Helping authors to report the structural information
In-house editorial curator assists authors via !
•  Excel spreadsheet
templates!
•  internal authoring tool!
and performs value-added
semantic annotation
analysis !
method! script!
Data file or !
record in a
database!
At initial submission
!"#$%&'() *+,',&,-).) *+,',&,-)/) *+,',&,-)0) *+,',&,-)1) 23'3)
!"#$%&'& ()#*&
+)%,+-%.+&
/01%)&
20$$%3+0".&
456&
%7+),3+0".&
45689%:& ;<=>>>>>&
!"#$%&?& ()#*&
+)%,+-%.+&
/01%)&
20$$%3+0".&
456&
%7+),3+0".&
45689%:& ;<=>>>>>&
!"#$%&.& ()#*&
+)%,+-%.+&
/01%)&
20$$%3+0".&
456&
%7+),3+0".&
45689%:& ;<=>>>>>&
&
•  Authors provide basic input, at minimum, information on
o  samples and subjects
o  experimental, computational and/or observational
information, or creation of aggregations
o  data outputs
•  Example for an experimental study:
Upon acceptance
•  The curator, with the help of the authors, completes the
structured description, drawing information from the
narrative component, and adds
o  information about the samples and subjects
o  details of the experimental, computational and/or
observational information, or creation of aggregations
o  details on data manipulations
•  Also performs value-added semantic tagging
o  replacing free text with terms from community-defined
terminologies (controlled vocabularies or ontologies)
Semantic tagging key information
!"#$%&'()
!"#$%&'&
!"#$%&(&
!"#$%&)&
&
Semantic tagging key information
analysis !
method! script!
Data file or !
record in a
database!
General-purpose, machine readable format
Designed to support:
•  description of the workflow
•  use community-defined
terminologies and minimal
reporting guidelines
o  depth of description will
vary contingent on the
particular context
Includes fields describing:
•  authors’ details, including
ORCID
•  publications
•  funding sources and funders’
name, via FundRef
•  study design
•  type of assays
•  type of protocols
•  links to relevant sections of the
narrative component
analysis !
method! script!
Data file or !
record in a
database!
Investigation file – overview and link to narrative
analysis !
method! script!
Data file or !
record in a
database!
Study file – samples / subjects description
Assays file - from samples to data files
•  Pointing to the
o  location of the data files in
the external repository(s)
o  name or ID of the files
~ 156
~ 70
~ 334
Source:BioPortal
Databases !
implementing !
standards!
miame!
MIAPA!
MIRIAM!
MIQAS!
MIX!
MIGEN!
CIMR!
MIAPE!
MIASE!
MIQE!
MISFISHIE….!
REMARK!
CONSORT!
MAGE-Tab!
GCDML!
SRAxml!
SOFT!
FASTA!
DICOM!
MzML!
SBRML!
SEDML…!
GELML!
ISA-Tab!
CML!
MITAB!
AAO!
CHEBI!
OBI!
PATO! ENVO!
MOD!
BTO!
IDO…!
TEDDY!
PRO!
XAO!
DO
VO!
Progressively refine guidance to authors and reviewers
In the life sciences
Mapping the landscape of standards and databases
What does a structured component add?
•  Supplements the scientific discourse!
o  natural language has a degree of ambiguity!
•  Brings clarity in reporting research methods and procedures!
o  no trimming, no cooking!
o  clear samples to data files links and relation to methods!
•  Provides the basis for search and discovery features!
SciData DD
Structured
content SciData DD
Structured
content
SciData DD
Structured
content
SciData DD
Structured
content
SciData DD
Structured
content
SciData DD
Structured
content
SciData DD
Structured
content
SciData DD
Structured
content
SciData DD
Structured
content
SciData DD
Structured
content
Same tissue
Same organism
Same assay
Community
Data
Repositories
Acknowledgements!
Visit
nature.com/scientificdata
Email
scientificdata@nature.com
Tweet
@ScientificData
Honorary Academic Editor
Susanna-Assunta Sansone, PhD
Managing Editor
Andrew L Hufton, PhD
Editorial Curator
Varsha Khodiyar
Publisher
Iain Hrynaszkiewicz
Advisory Panel and Editorial Board including
senior researchers, funders, librarians and curators
Philippe
Rocca-Serra, PhD
Alejandra
Gonzalez-Beltran, PhD
Eamonn
Maguire
Milo
Thurston, PhD
and Funders, Advisory Boards and Collaborators

Contenu connexe

En vedette

NPG Scientific Data Overview for GBIF - TDWG meeting Oct 2013
NPG Scientific Data Overview for GBIF - TDWG meeting Oct 2013NPG Scientific Data Overview for GBIF - TDWG meeting Oct 2013
NPG Scientific Data Overview for GBIF - TDWG meeting Oct 2013Susanna-Assunta Sansone
 
NIH BD2K DataMed data index - DATS model
NIH BD2K DataMed data index - DATS modelNIH BD2K DataMed data index - DATS model
NIH BD2K DataMed data index - DATS modelSusanna-Assunta Sansone
 
RDA BioSharing WG + RDA Metabolomics IG OVERVIEWS
RDA BioSharing WG + RDA Metabolomics IG OVERVIEWSRDA BioSharing WG + RDA Metabolomics IG OVERVIEWS
RDA BioSharing WG + RDA Metabolomics IG OVERVIEWSSusanna-Assunta Sansone
 
On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...Susanna-Assunta Sansone
 
High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014
High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014
High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014Susanna-Assunta Sansone
 
RDA - Long Tail Data Interest Group - NPG Scientitic Data oveview
RDA - Long Tail Data Interest Group - NPG Scientitic Data oveviewRDA - Long Tail Data Interest Group - NPG Scientitic Data oveview
RDA - Long Tail Data Interest Group - NPG Scientitic Data oveviewSusanna-Assunta Sansone
 
On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...Susanna-Assunta Sansone
 
BioSharing WG - ELIXIR IG - RDA Plenary 7, Tokyo, March 2016
BioSharing WG - ELIXIR IG - RDA Plenary 7, Tokyo, March 2016BioSharing WG - ELIXIR IG - RDA Plenary 7, Tokyo, March 2016
BioSharing WG - ELIXIR IG - RDA Plenary 7, Tokyo, March 2016Susanna-Assunta Sansone
 
Sansone Westminster Higher Education Forum - Open Access, Open Data - March 2015
Sansone Westminster Higher Education Forum - Open Access, Open Data - March 2015Sansone Westminster Higher Education Forum - Open Access, Open Data - March 2015
Sansone Westminster Higher Education Forum - Open Access, Open Data - March 2015Susanna-Assunta Sansone
 

En vedette (11)

NPG Scientific Data Overview for GBIF - TDWG meeting Oct 2013
NPG Scientific Data Overview for GBIF - TDWG meeting Oct 2013NPG Scientific Data Overview for GBIF - TDWG meeting Oct 2013
NPG Scientific Data Overview for GBIF - TDWG meeting Oct 2013
 
NIH BD2K DataMed data index - DATS model
NIH BD2K DataMed data index - DATS modelNIH BD2K DataMed data index - DATS model
NIH BD2K DataMed data index - DATS model
 
RDA BioSharing WG + RDA Metabolomics IG OVERVIEWS
RDA BioSharing WG + RDA Metabolomics IG OVERVIEWSRDA BioSharing WG + RDA Metabolomics IG OVERVIEWS
RDA BioSharing WG + RDA Metabolomics IG OVERVIEWS
 
On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...
 
High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014
High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014
High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014
 
RDA - Long Tail Data Interest Group - NPG Scientitic Data oveview
RDA - Long Tail Data Interest Group - NPG Scientitic Data oveviewRDA - Long Tail Data Interest Group - NPG Scientitic Data oveview
RDA - Long Tail Data Interest Group - NPG Scientitic Data oveview
 
B4OS-2012
B4OS-2012B4OS-2012
B4OS-2012
 
On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...
 
BioSharing WG - ELIXIR IG - RDA Plenary 7, Tokyo, March 2016
BioSharing WG - ELIXIR IG - RDA Plenary 7, Tokyo, March 2016BioSharing WG - ELIXIR IG - RDA Plenary 7, Tokyo, March 2016
BioSharing WG - ELIXIR IG - RDA Plenary 7, Tokyo, March 2016
 
ISA - a short overview - Dec 2013
ISA - a short overview - Dec 2013ISA - a short overview - Dec 2013
ISA - a short overview - Dec 2013
 
Sansone Westminster Higher Education Forum - Open Access, Open Data - March 2015
Sansone Westminster Higher Education Forum - Open Access, Open Data - March 2015Sansone Westminster Higher Education Forum - Open Access, Open Data - March 2015
Sansone Westminster Higher Education Forum - Open Access, Open Data - March 2015
 

Plus de Susanna-Assunta Sansone

FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
NFDI Physical Sciences Colloquium - FAIR
NFDI Physical Sciences Colloquium - FAIRNFDI Physical Sciences Colloquium - FAIR
NFDI Physical Sciences Colloquium - FAIRSusanna-Assunta Sansone
 
FAIR, community standards and data FAIRification: components and recipes
FAIR, community standards and data FAIRification: components and recipesFAIR, community standards and data FAIRification: components and recipes
FAIR, community standards and data FAIRification: components and recipesSusanna-Assunta Sansone
 
FAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
FAIRification is a Team Sport: FAIRsharing and the FAIR CookbookFAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
FAIRification is a Team Sport: FAIRsharing and the FAIR CookbookSusanna-Assunta Sansone
 
FAIRsharing: how we assist with FAIRness
FAIRsharing: how we assist with FAIRnessFAIRsharing: how we assist with FAIRness
FAIRsharing: how we assist with FAIRnessSusanna-Assunta Sansone
 
FAIRsharing - focus on standards and new features
FAIRsharing - focus on standards and new features FAIRsharing - focus on standards and new features
FAIRsharing - focus on standards and new features Susanna-Assunta Sansone
 
FAIR data and standards for a coordinated COVID-19 response
FAIR data and standards for a coordinated COVID-19 responseFAIR data and standards for a coordinated COVID-19 response
FAIR data and standards for a coordinated COVID-19 responseSusanna-Assunta Sansone
 

Plus de Susanna-Assunta Sansone (20)

FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
FAIRsharing-Standards-4-GSC-Aug23.pdf
FAIRsharing-Standards-4-GSC-Aug23.pdfFAIRsharing-Standards-4-GSC-Aug23.pdf
FAIRsharing-Standards-4-GSC-Aug23.pdf
 
FAIR-4-GSC-Sansone-Aug23.pdf
FAIR-4-GSC-Sansone-Aug23.pdfFAIR-4-GSC-Sansone-Aug23.pdf
FAIR-4-GSC-Sansone-Aug23.pdf
 
FAIRsharing & FAIRcookbook at RDA 2023
FAIRsharing & FAIRcookbook at RDA 2023FAIRsharing & FAIRcookbook at RDA 2023
FAIRsharing & FAIRcookbook at RDA 2023
 
NFDI Physical Sciences Colloquium - FAIR
NFDI Physical Sciences Colloquium - FAIRNFDI Physical Sciences Colloquium - FAIR
NFDI Physical Sciences Colloquium - FAIR
 
Metadata Standards
Metadata StandardsMetadata Standards
Metadata Standards
 
FAIRcookbook: GSRS22-Singapore
FAIRcookbook: GSRS22-SingaporeFAIRcookbook: GSRS22-Singapore
FAIRcookbook: GSRS22-Singapore
 
FAIR Cookbook
FAIR Cookbook FAIR Cookbook
FAIR Cookbook
 
FAIR, community standards and data FAIRification: components and recipes
FAIR, community standards and data FAIRification: components and recipesFAIR, community standards and data FAIRification: components and recipes
FAIR, community standards and data FAIRification: components and recipes
 
FAIRsharing and the FAIR Cookbook
FAIRsharing and the FAIR Cookbook FAIRsharing and the FAIR Cookbook
FAIRsharing and the FAIR Cookbook
 
FAIRsharing for EOSC
FAIRsharing for EOSC FAIRsharing for EOSC
FAIRsharing for EOSC
 
FAIR: standards and services
FAIR: standards and servicesFAIR: standards and services
FAIR: standards and services
 
FAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
FAIRification is a Team Sport: FAIRsharing and the FAIR CookbookFAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
FAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
 
FAIRsharing: what we do for policies
FAIRsharing: what we do for policiesFAIRsharing: what we do for policies
FAIRsharing: what we do for policies
 
FAIRsharing: how we assist with FAIRness
FAIRsharing: how we assist with FAIRnessFAIRsharing: how we assist with FAIRness
FAIRsharing: how we assist with FAIRness
 
ELIXIR FAIR Activities - Examplars
ELIXIR FAIR Activities - ExamplarsELIXIR FAIR Activities - Examplars
ELIXIR FAIR Activities - Examplars
 
FAIRsharing - focus on standards and new features
FAIRsharing - focus on standards and new features FAIRsharing - focus on standards and new features
FAIRsharing - focus on standards and new features
 
FAIR data and standards for a coordinated COVID-19 response
FAIR data and standards for a coordinated COVID-19 responseFAIR data and standards for a coordinated COVID-19 response
FAIR data and standards for a coordinated COVID-19 response
 
FAIRsharing poster
FAIRsharing posterFAIRsharing poster
FAIRsharing poster
 
The FAIR Cookbook poster
The FAIR Cookbook posterThe FAIR Cookbook poster
The FAIR Cookbook poster
 

Dernier

Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...ThinkInnovation
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...ThinkInnovation
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are successPratikSingh115843
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 

Dernier (16)

Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are success
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 

Better data through better curation - Ssansone, NPG event on data publication, 14 Nov 2014

  • 1. Better data through better curation ! ! Susanna-Assunta Sansone, PhD! ! @biosharing! @isatools! ! Publishing better science through better data, Open Research, Nature Publishing Group, 14 November 2014 Data Consultant, Honorary Academic Editor Associate Director, Principal Investigator
  • 2. Data Descriptor: two complementary components Article or ! narrative component! (PDF and HTML)! ! ! ! Experimental metadata or! structured component! (in-house curated, machine-readable format)!
  • 3. Data Descriptor: two complementary components Article or ! narrative component! (PDF and HTML)! ! ! ! Experimental metadata or! structured component! (in-house curated, machine-readable format)!
  • 4. Structured component enhances Methods & Data “The Methods section should include detailed text describing the methods and procedures used in the study and assay(s), and the processing steps leading to the production of the data files, including any computational analyses….. ….. The Data Records section should be used to explain each data record associated with this work, including the repository where this information is stored, and an overview of the data files and their formats.”
  • 5. Focus on the description of the experimental workflow •  We need to report sufficient information to reuse the dataset •  We must strike a balance between depth and breadth of information
  • 6. Focus on the description of the experimental workflow •  Not too much •  Not too little •  But just right
  • 7. Structured component: key information from narrative Seven week old C57BL/6N mice were treated with low-fat diet. Liver was dissected out, hepatocytes prepared…
  • 8. Age value Unit Strain name Subject of the experiment Type of diet and experimental condition Anatomy part Seven week old C57BL/6N mice were treated with low-fat diet. Liver was dissected out, hepatocytes prepared … From natural language to ‘computable’ concepts
  • 9. Age value Unit Strain name Subject of the experiment Type of diet and experimental condition Anatomy part Seven week old C57BL/6N mice were treated with low-fat diet. Liver was dissected out, hepatocytes prepared … From natural language to ‘computable’ concepts Type of protocol – cell preparation Type of protocol - sample treatment Type of protocol – liver preparation
  • 10. The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project 1 0 Example of richly annotated, computable description Credit to: OBI consortium
  • 12. …how not to report the experimental information! •  L!S1 ! !liver sample 1! •  C2 ! !compound 2! •  LD ! !low dose! •  TP2 ! !time point 2! •  P1 ! !protocol 1! •  file1-fastq.gz !compressed data file for sequence ! ! !information corresponding to this ! ! !sample! Sample name (?!)! Data file! LS1_C2_LD_TP2_P1! file1-fastq.gz!
  • 13. Helping authors to report the structural information In-house editorial curator assists authors via ! •  Excel spreadsheet templates! •  internal authoring tool! and performs value-added semantic annotation analysis ! method! script! Data file or ! record in a database!
  • 14. At initial submission !"#$%&'() *+,',&,-).) *+,',&,-)/) *+,',&,-)0) *+,',&,-)1) 23'3) !"#$%&'& ()#*& +)%,+-%.+& /01%)& 20$$%3+0".& 456& %7+),3+0".& 45689%:& ;<=>>>>>& !"#$%&?& ()#*& +)%,+-%.+& /01%)& 20$$%3+0".& 456& %7+),3+0".& 45689%:& ;<=>>>>>& !"#$%&.& ()#*& +)%,+-%.+& /01%)& 20$$%3+0".& 456& %7+),3+0".& 45689%:& ;<=>>>>>& & •  Authors provide basic input, at minimum, information on o  samples and subjects o  experimental, computational and/or observational information, or creation of aggregations o  data outputs •  Example for an experimental study:
  • 15. Upon acceptance •  The curator, with the help of the authors, completes the structured description, drawing information from the narrative component, and adds o  information about the samples and subjects o  details of the experimental, computational and/or observational information, or creation of aggregations o  details on data manipulations •  Also performs value-added semantic tagging o  replacing free text with terms from community-defined terminologies (controlled vocabularies or ontologies)
  • 16. Semantic tagging key information !"#$%&'() !"#$%&'& !"#$%&(& !"#$%&)& &
  • 17. Semantic tagging key information
  • 18. analysis ! method! script! Data file or ! record in a database! General-purpose, machine readable format Designed to support: •  description of the workflow •  use community-defined terminologies and minimal reporting guidelines o  depth of description will vary contingent on the particular context
  • 19. Includes fields describing: •  authors’ details, including ORCID •  publications •  funding sources and funders’ name, via FundRef •  study design •  type of assays •  type of protocols •  links to relevant sections of the narrative component analysis ! method! script! Data file or ! record in a database! Investigation file – overview and link to narrative
  • 20. analysis ! method! script! Data file or ! record in a database! Study file – samples / subjects description
  • 21. Assays file - from samples to data files •  Pointing to the o  location of the data files in the external repository(s) o  name or ID of the files
  • 22. ~ 156 ~ 70 ~ 334 Source:BioPortal Databases ! implementing ! standards! miame! MIAPA! MIRIAM! MIQAS! MIX! MIGEN! CIMR! MIAPE! MIASE! MIQE! MISFISHIE….! REMARK! CONSORT! MAGE-Tab! GCDML! SRAxml! SOFT! FASTA! DICOM! MzML! SBRML! SEDML…! GELML! ISA-Tab! CML! MITAB! AAO! CHEBI! OBI! PATO! ENVO! MOD! BTO! IDO…! TEDDY! PRO! XAO! DO VO! Progressively refine guidance to authors and reviewers In the life sciences
  • 23. Mapping the landscape of standards and databases
  • 24. What does a structured component add? •  Supplements the scientific discourse! o  natural language has a degree of ambiguity! •  Brings clarity in reporting research methods and procedures! o  no trimming, no cooking! o  clear samples to data files links and relation to methods! •  Provides the basis for search and discovery features! SciData DD Structured content SciData DD Structured content SciData DD Structured content SciData DD Structured content SciData DD Structured content SciData DD Structured content SciData DD Structured content SciData DD Structured content SciData DD Structured content SciData DD Structured content Same tissue Same organism Same assay Community Data Repositories
  • 25. Acknowledgements! Visit nature.com/scientificdata Email scientificdata@nature.com Tweet @ScientificData Honorary Academic Editor Susanna-Assunta Sansone, PhD Managing Editor Andrew L Hufton, PhD Editorial Curator Varsha Khodiyar Publisher Iain Hrynaszkiewicz Advisory Panel and Editorial Board including senior researchers, funders, librarians and curators Philippe Rocca-Serra, PhD Alejandra Gonzalez-Beltran, PhD Eamonn Maguire Milo Thurston, PhD and Funders, Advisory Boards and Collaborators