SlideShare une entreprise Scribd logo
1  sur  27
How to find and provide FAANG
Data
The FAANG Data Coordination
Centre
Laura Clarke
Vertebrate Data Coordination
www.ebi.ac.uk
@laurastephen
Value of Metadata
Data Access
Metadata Standards
Validation tools
FAANG Data availability
Support
Tara Oceans
•2 ½ year expedition
•210 sampling stations
•Standardized measurements
•Genetic
•Morphological
•Physico-Chemical
Good metadata enables great science
Good metadata enables great science
HipSci
•750 iPSC lines
•Healthy and rare disease donors
•Extensive genomic and epigenomic characterization
•All lines and data available to community
Good metadata enables great science
H Kilpinen et al. Nature 546, 370–375 (2017) doi:10.1038/nature22403
Good metadata enables great science
The FAANG Data Coordination Centre
•Supporting Submission
•Ensuring high quality data description
•Making the data accessible
•Providing consistent analysis products
Findable
• Global persistent identifier
• Rich metadata
• Store metadata in
registries
Accessible
• Resolvable identifiers
• Metadata persists
• Machine and human
access
Interoperable
• Open data format
• Modelled with FAIR
compliant vocabularies
• Reference external data
Reusable
• Rich metadata
• Clear license
• Provenance
Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific
data management and stewardship Authors. Nature Scientific Data
3, 1–15 (2016). DOI: 10.1038/sdata.2016.18
Alasdair J G Gray
Heriot-Watt University
Ensuring the data is FAIR
• Needs
• Well structured
• Consistent naming
• Specific descriptions
• Enables
• Aggregation
• Integration
• Tracking
Good data is well described data
• Representation of important things in a specific domain
• Describes types of entities (e.g. cells) and relations between them
• An active, formal computational artifact
• A mathematical model based on a subset of first order logic
• Tools can automatically process ontologies for analysis - e.g. gene expression enrichment
analysis
• A communication tool
• Provides a dictionary for collaborators, a shared understanding
• Allows data sharing
Use Ontologies
Myeloid Leucocyte
Monocyte
CD14+ Monocyte
• OLS - The Ontology Lookup Service
• http://www.ebi.ac.uk/ols/index
• Indexes 150 biomedical ontologies
• (4.5 million terms, 11 million relations)
• Zooma
• http://www.ebi.ac.uk/spot/zooma/
• Using past knowledge to inform new annotation
• Curated mappings from the Expression Atlas, Open Targets and others
• Webulous
• http://www.ebi.ac.uk/spot/webulous/
• GoogleSheets template system
• Create new ontology terms
• OXO (in beta)
• http://www.ebi.ac.uk/spot/oxo/
• Cross references between ontologies
• All services have API and UI access
Webulous
Use Ontologies
Supporting deposition of well described data
FAANG Validation Service
Validates completed metadata Excel templates and
prepares metadata for archive submission
http://www.ebi.ac.uk/vg/faang
Supporting deposition of well described data
•Checks ontologies (scope, accuracy, terms).
•Relationships (familial, breeds).
•Minimum standards and validity.
Supporting deposition of well described data
Supporting deposition of well described data
Supporting deposition of well described data
•On conversion, validates
again and checks project
information.
•If passes, returns correctly
formatted SampleTab for
BioSamples and XML for
ENA.
Supporting deposition of well described data
• The Validation service code and website
• http://www.ebi.ac.uk/vg/faang
•https://github.com/faang/faang-metadata
• https://github.com/FAANG/validate-metadata
How much data?
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Donor
Specimen
Gallus Gallus Ovis aries Sus scrofa Bos taurus Bubalus bubalis Capra hircus
1240
8428 Specimens
285 Donor Animals
132 62 56 14 13 8
678 2479 1423 1667 941
How much data?
132 62 56 14 13 8
678 2479 1423 1667 941
8 European Nucleotide Archive studies submitted
• 4891 sequencing runs
Largest submission
• RNA sequencing of tissues and cell types from Scottish
Blackface x Texel sheep for transcriptome annotation
and expression analysis, The Roslin Institute
• 3994 sequencing runs
Finding the FAANG Data
http://data.faang.org/home
Finding the FAANG Data
http://data.faang.org/organism
Finding the FAANG Data
http://data.faang.org/organism/SAMEA103886117
Finding the FAANG Data
http://data.faang.org/specimen/SAMEA103886170
Finding the FAANG Data
•More Data
• Additional FAANG data
• Other livestock data using legacy standards
•Standard Analysis products
•Trackhub links
•Better search
•Sortable tables
Who is helping you?
Peter Harrison Jun Fan
faang-dcc@ebi.ac.uk
Overview
0% 20% 40% 60% 80% 100%
Donor
Specimen
Questions?
Find out how to submit data
http://bit.ly/FAANGArchiveGuide
Ask for help
faang-dcc@ebi.ac.uk
@faangomics on twitter
Let us know about your project
http://bit.ly/FAANGProjectRegistry

Contenu connexe

Tendances

THOR Workshop - Data Publishing Elsevier
THOR Workshop - Data Publishing ElsevierTHOR Workshop - Data Publishing Elsevier
THOR Workshop - Data Publishing ElsevierMaaike Duine
 
OpenAIRE-COAR conference 2014: Allowing research data to shine: providing tan...
OpenAIRE-COAR conference 2014: Allowing research data to shine: providing tan...OpenAIRE-COAR conference 2014: Allowing research data to shine: providing tan...
OpenAIRE-COAR conference 2014: Allowing research data to shine: providing tan...OpenAIRE
 
A snake, a planet, and a bear ditching spreadsheets for quick, reproducible r...
A snake, a planet, and a bear ditching spreadsheets for quick, reproducible r...A snake, a planet, and a bear ditching spreadsheets for quick, reproducible r...
A snake, a planet, and a bear ditching spreadsheets for quick, reproducible r...NASIG
 
Big Data Initiatives for Agroecosystems
Big Data Initiatives for AgroecosystemsBig Data Initiatives for Agroecosystems
Big Data Initiatives for AgroecosystemsCyndy Parr
 
Wrangling RedCap_An Introduction and Inspiration
Wrangling RedCap_An Introduction and InspirationWrangling RedCap_An Introduction and Inspiration
Wrangling RedCap_An Introduction and InspirationJacqueline Stern
 
NIH BD2K bioCADDIE DataMed: Data Discovery Index
NIH BD2K bioCADDIE DataMed: Data Discovery IndexNIH BD2K bioCADDIE DataMed: Data Discovery Index
NIH BD2K bioCADDIE DataMed: Data Discovery IndexSusanna-Assunta Sansone
 
The Kaleidoscope of Impact: same data, different perspectives, constantly cha...
The Kaleidoscope of Impact: same data, different perspectives, constantly cha...The Kaleidoscope of Impact: same data, different perspectives, constantly cha...
The Kaleidoscope of Impact: same data, different perspectives, constantly cha...Kudos
 
Data in Brief and Dataverse: Incentivizing Authors to Share Data by Paige Sha...
Data in Brief and Dataverse: Incentivizing Authors to Share Data by Paige Sha...Data in Brief and Dataverse: Incentivizing Authors to Share Data by Paige Sha...
Data in Brief and Dataverse: Incentivizing Authors to Share Data by Paige Sha...datascienceiqss
 
Building on the Atlas (of Living Australia)
Building on the Atlas (of Living Australia)Building on the Atlas (of Living Australia)
Building on the Atlas (of Living Australia)Andrew Treloar
 
From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...Catherine Canevet
 
THOR Ambassador Webinar
THOR Ambassador WebinarTHOR Ambassador Webinar
THOR Ambassador WebinarMaaike Duine
 
Data Repositories Impact
Data Repositories ImpactData Repositories Impact
Data Repositories ImpactMerce Crosas
 
Integrating research indicators for use in the repositories infrastructure
Integrating research indicators for use in the repositories infrastructure Integrating research indicators for use in the repositories infrastructure
Integrating research indicators for use in the repositories infrastructure petrknoth
 
FAIR Data and Model Management for Systems Biology (and SOPs too!)
FAIR Data and Model Management for Systems Biology(and SOPs too!)FAIR Data and Model Management for Systems Biology(and SOPs too!)
FAIR Data and Model Management for Systems Biology (and SOPs too!)Carole Goble
 
New product developments - Jennifer Lin - London LIVE 2017
New product developments - Jennifer Lin - London LIVE 2017New product developments - Jennifer Lin - London LIVE 2017
New product developments - Jennifer Lin - London LIVE 2017Crossref
 
Workflows for Publishing Data; Scientific Data's experience as an early adopter
Workflows for Publishing Data; Scientific Data's experience as an early adopterWorkflows for Publishing Data; Scientific Data's experience as an early adopter
Workflows for Publishing Data; Scientific Data's experience as an early adopterVarsha Khodiyar
 
Medical Intelligence EDW 20 juni: Radboudumc
Medical Intelligence EDW 20 juni: RadboudumcMedical Intelligence EDW 20 juni: Radboudumc
Medical Intelligence EDW 20 juni: RadboudumcFurore_com
 
Developing Apps: Exposing Your Data Through Araport
Developing Apps: Exposing Your Data Through AraportDeveloping Apps: Exposing Your Data Through Araport
Developing Apps: Exposing Your Data Through AraportMatthew Vaughn
 
Data sharing as part of the research ecosystem
Data sharing as part of the research ecosystemData sharing as part of the research ecosystem
Data sharing as part of the research ecosystemVarsha Khodiyar
 

Tendances (20)

THOR Workshop - Data Publishing Elsevier
THOR Workshop - Data Publishing ElsevierTHOR Workshop - Data Publishing Elsevier
THOR Workshop - Data Publishing Elsevier
 
OpenAIRE-COAR conference 2014: Allowing research data to shine: providing tan...
OpenAIRE-COAR conference 2014: Allowing research data to shine: providing tan...OpenAIRE-COAR conference 2014: Allowing research data to shine: providing tan...
OpenAIRE-COAR conference 2014: Allowing research data to shine: providing tan...
 
A snake, a planet, and a bear ditching spreadsheets for quick, reproducible r...
A snake, a planet, and a bear ditching spreadsheets for quick, reproducible r...A snake, a planet, and a bear ditching spreadsheets for quick, reproducible r...
A snake, a planet, and a bear ditching spreadsheets for quick, reproducible r...
 
Big Data Initiatives for Agroecosystems
Big Data Initiatives for AgroecosystemsBig Data Initiatives for Agroecosystems
Big Data Initiatives for Agroecosystems
 
Wrangling RedCap_An Introduction and Inspiration
Wrangling RedCap_An Introduction and InspirationWrangling RedCap_An Introduction and Inspiration
Wrangling RedCap_An Introduction and Inspiration
 
NIH BD2K bioCADDIE DataMed: Data Discovery Index
NIH BD2K bioCADDIE DataMed: Data Discovery IndexNIH BD2K bioCADDIE DataMed: Data Discovery Index
NIH BD2K bioCADDIE DataMed: Data Discovery Index
 
The Kaleidoscope of Impact: same data, different perspectives, constantly cha...
The Kaleidoscope of Impact: same data, different perspectives, constantly cha...The Kaleidoscope of Impact: same data, different perspectives, constantly cha...
The Kaleidoscope of Impact: same data, different perspectives, constantly cha...
 
Data in Brief and Dataverse: Incentivizing Authors to Share Data by Paige Sha...
Data in Brief and Dataverse: Incentivizing Authors to Share Data by Paige Sha...Data in Brief and Dataverse: Incentivizing Authors to Share Data by Paige Sha...
Data in Brief and Dataverse: Incentivizing Authors to Share Data by Paige Sha...
 
Building on the Atlas (of Living Australia)
Building on the Atlas (of Living Australia)Building on the Atlas (of Living Australia)
Building on the Atlas (of Living Australia)
 
From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...
 
THOR Ambassador Webinar
THOR Ambassador WebinarTHOR Ambassador Webinar
THOR Ambassador Webinar
 
TAIR ICAR 2010 Presentation
TAIR ICAR 2010 PresentationTAIR ICAR 2010 Presentation
TAIR ICAR 2010 Presentation
 
Data Repositories Impact
Data Repositories ImpactData Repositories Impact
Data Repositories Impact
 
Integrating research indicators for use in the repositories infrastructure
Integrating research indicators for use in the repositories infrastructure Integrating research indicators for use in the repositories infrastructure
Integrating research indicators for use in the repositories infrastructure
 
FAIR Data and Model Management for Systems Biology (and SOPs too!)
FAIR Data and Model Management for Systems Biology(and SOPs too!)FAIR Data and Model Management for Systems Biology(and SOPs too!)
FAIR Data and Model Management for Systems Biology (and SOPs too!)
 
New product developments - Jennifer Lin - London LIVE 2017
New product developments - Jennifer Lin - London LIVE 2017New product developments - Jennifer Lin - London LIVE 2017
New product developments - Jennifer Lin - London LIVE 2017
 
Workflows for Publishing Data; Scientific Data's experience as an early adopter
Workflows for Publishing Data; Scientific Data's experience as an early adopterWorkflows for Publishing Data; Scientific Data's experience as an early adopter
Workflows for Publishing Data; Scientific Data's experience as an early adopter
 
Medical Intelligence EDW 20 juni: Radboudumc
Medical Intelligence EDW 20 juni: RadboudumcMedical Intelligence EDW 20 juni: Radboudumc
Medical Intelligence EDW 20 juni: Radboudumc
 
Developing Apps: Exposing Your Data Through Araport
Developing Apps: Exposing Your Data Through AraportDeveloping Apps: Exposing Your Data Through Araport
Developing Apps: Exposing Your Data Through Araport
 
Data sharing as part of the research ecosystem
Data sharing as part of the research ecosystemData sharing as part of the research ecosystem
Data sharing as part of the research ecosystem
 

Similaire à L clarke faang_dcc_isag_2017_compress

Variation and Assembly Resources at EMBL-EBI
Variation and Assembly Resources at EMBL-EBIVariation and Assembly Resources at EMBL-EBI
Variation and Assembly Resources at EMBL-EBILaura Clarke
 
(BAC208) Bursting to the Cloud: Deploying a Hybrid Cloud Storage Solution wit...
(BAC208) Bursting to the Cloud: Deploying a Hybrid Cloud Storage Solution wit...(BAC208) Bursting to the Cloud: Deploying a Hybrid Cloud Storage Solution wit...
(BAC208) Bursting to the Cloud: Deploying a Hybrid Cloud Storage Solution wit...Amazon Web Services
 
An Oz Mammals Bioinformatics and Data Resource
An Oz Mammals Bioinformatics and Data ResourceAn Oz Mammals Bioinformatics and Data Resource
An Oz Mammals Bioinformatics and Data ResourcePhilippa Griffin
 
Globus Integrations (GlobusWorld Tour - UMich)
Globus Integrations (GlobusWorld Tour - UMich)Globus Integrations (GlobusWorld Tour - UMich)
Globus Integrations (GlobusWorld Tour - UMich)Globus
 
Dataverse Netowrk Project
Dataverse Netowrk ProjectDataverse Netowrk Project
Dataverse Netowrk ProjectJulie Goldman
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduriRavi Madduri
 
Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing dataWorld Agroforestry (ICRAF)
 
Globus Integrations (GlobusWorld Tour - UCSD)
Globus Integrations (GlobusWorld Tour - UCSD)Globus Integrations (GlobusWorld Tour - UCSD)
Globus Integrations (GlobusWorld Tour - UCSD)Globus
 
Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...mestato
 
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...Merce Crosas
 
Globus Integrations (JupyterHub, Django, ...)
Globus Integrations (JupyterHub, Django, ...)Globus Integrations (JupyterHub, Django, ...)
Globus Integrations (JupyterHub, Django, ...)Globus
 
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015Monica Munoz-Torres
 
How to make your published data findable, accessible, interoperable and reusable
How to make your published data findable, accessible, interoperable and reusableHow to make your published data findable, accessible, interoperable and reusable
How to make your published data findable, accessible, interoperable and reusablePhoenix Bioinformatics
 
EOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryCarole Goble
 
Beyond openurl
Beyond openurlBeyond openurl
Beyond openurlCrossref
 
David Van Enckevort - FAIR sample and data access
David Van Enckevort - FAIR sample and data access David Van Enckevort - FAIR sample and data access
David Van Enckevort - FAIR sample and data access DataSciSIG
 

Similaire à L clarke faang_dcc_isag_2017_compress (20)

Variation and Assembly Resources at EMBL-EBI
Variation and Assembly Resources at EMBL-EBIVariation and Assembly Resources at EMBL-EBI
Variation and Assembly Resources at EMBL-EBI
 
(BAC208) Bursting to the Cloud: Deploying a Hybrid Cloud Storage Solution wit...
(BAC208) Bursting to the Cloud: Deploying a Hybrid Cloud Storage Solution wit...(BAC208) Bursting to the Cloud: Deploying a Hybrid Cloud Storage Solution wit...
(BAC208) Bursting to the Cloud: Deploying a Hybrid Cloud Storage Solution wit...
 
COPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob DaveyCOPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob Davey
 
An Oz Mammals Bioinformatics and Data Resource
An Oz Mammals Bioinformatics and Data ResourceAn Oz Mammals Bioinformatics and Data Resource
An Oz Mammals Bioinformatics and Data Resource
 
Globus Integrations (GlobusWorld Tour - UMich)
Globus Integrations (GlobusWorld Tour - UMich)Globus Integrations (GlobusWorld Tour - UMich)
Globus Integrations (GlobusWorld Tour - UMich)
 
Kasyanov "Web of Science API Workshop"
Kasyanov "Web of Science API Workshop"Kasyanov "Web of Science API Workshop"
Kasyanov "Web of Science API Workshop"
 
COPO kick-off meeting
COPO kick-off meetingCOPO kick-off meeting
COPO kick-off meeting
 
Dataverse Netowrk Project
Dataverse Netowrk ProjectDataverse Netowrk Project
Dataverse Netowrk Project
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduri
 
Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing data
 
Globus Integrations (GlobusWorld Tour - UCSD)
Globus Integrations (GlobusWorld Tour - UCSD)Globus Integrations (GlobusWorld Tour - UCSD)
Globus Integrations (GlobusWorld Tour - UCSD)
 
The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...
 
Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...
 
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
 
Globus Integrations (JupyterHub, Django, ...)
Globus Integrations (JupyterHub, Django, ...)Globus Integrations (JupyterHub, Django, ...)
Globus Integrations (JupyterHub, Django, ...)
 
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
 
How to make your published data findable, accessible, interoperable and reusable
How to make your published data findable, accessible, interoperable and reusableHow to make your published data findable, accessible, interoperable and reusable
How to make your published data findable, accessible, interoperable and reusable
 
EOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow Collaboratory
 
Beyond openurl
Beyond openurlBeyond openurl
Beyond openurl
 
David Van Enckevort - FAIR sample and data access
David Van Enckevort - FAIR sample and data access David Van Enckevort - FAIR sample and data access
David Van Enckevort - FAIR sample and data access
 

Dernier

Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 

Dernier (20)

Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 

L clarke faang_dcc_isag_2017_compress

  • 1. How to find and provide FAANG Data The FAANG Data Coordination Centre Laura Clarke Vertebrate Data Coordination www.ebi.ac.uk @laurastephen
  • 2. Value of Metadata Data Access Metadata Standards Validation tools FAANG Data availability Support
  • 3. Tara Oceans •2 ½ year expedition •210 sampling stations •Standardized measurements •Genetic •Morphological •Physico-Chemical Good metadata enables great science
  • 4. Good metadata enables great science
  • 5. HipSci •750 iPSC lines •Healthy and rare disease donors •Extensive genomic and epigenomic characterization •All lines and data available to community Good metadata enables great science
  • 6. H Kilpinen et al. Nature 546, 370–375 (2017) doi:10.1038/nature22403 Good metadata enables great science
  • 7. The FAANG Data Coordination Centre •Supporting Submission •Ensuring high quality data description •Making the data accessible •Providing consistent analysis products
  • 8. Findable • Global persistent identifier • Rich metadata • Store metadata in registries Accessible • Resolvable identifiers • Metadata persists • Machine and human access Interoperable • Open data format • Modelled with FAIR compliant vocabularies • Reference external data Reusable • Rich metadata • Clear license • Provenance Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship Authors. Nature Scientific Data 3, 1–15 (2016). DOI: 10.1038/sdata.2016.18 Alasdair J G Gray Heriot-Watt University Ensuring the data is FAIR
  • 9. • Needs • Well structured • Consistent naming • Specific descriptions • Enables • Aggregation • Integration • Tracking Good data is well described data
  • 10. • Representation of important things in a specific domain • Describes types of entities (e.g. cells) and relations between them • An active, formal computational artifact • A mathematical model based on a subset of first order logic • Tools can automatically process ontologies for analysis - e.g. gene expression enrichment analysis • A communication tool • Provides a dictionary for collaborators, a shared understanding • Allows data sharing Use Ontologies Myeloid Leucocyte Monocyte CD14+ Monocyte
  • 11. • OLS - The Ontology Lookup Service • http://www.ebi.ac.uk/ols/index • Indexes 150 biomedical ontologies • (4.5 million terms, 11 million relations) • Zooma • http://www.ebi.ac.uk/spot/zooma/ • Using past knowledge to inform new annotation • Curated mappings from the Expression Atlas, Open Targets and others • Webulous • http://www.ebi.ac.uk/spot/webulous/ • GoogleSheets template system • Create new ontology terms • OXO (in beta) • http://www.ebi.ac.uk/spot/oxo/ • Cross references between ontologies • All services have API and UI access Webulous Use Ontologies
  • 12. Supporting deposition of well described data FAANG Validation Service Validates completed metadata Excel templates and prepares metadata for archive submission http://www.ebi.ac.uk/vg/faang
  • 13. Supporting deposition of well described data •Checks ontologies (scope, accuracy, terms). •Relationships (familial, breeds). •Minimum standards and validity.
  • 14. Supporting deposition of well described data
  • 15. Supporting deposition of well described data
  • 16. Supporting deposition of well described data •On conversion, validates again and checks project information. •If passes, returns correctly formatted SampleTab for BioSamples and XML for ENA.
  • 17. Supporting deposition of well described data • The Validation service code and website • http://www.ebi.ac.uk/vg/faang •https://github.com/faang/faang-metadata • https://github.com/FAANG/validate-metadata
  • 18. How much data? 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Donor Specimen Gallus Gallus Ovis aries Sus scrofa Bos taurus Bubalus bubalis Capra hircus 1240 8428 Specimens 285 Donor Animals 132 62 56 14 13 8 678 2479 1423 1667 941
  • 19. How much data? 132 62 56 14 13 8 678 2479 1423 1667 941 8 European Nucleotide Archive studies submitted • 4891 sequencing runs Largest submission • RNA sequencing of tissues and cell types from Scottish Blackface x Texel sheep for transcriptome annotation and expression analysis, The Roslin Institute • 3994 sequencing runs
  • 20. Finding the FAANG Data http://data.faang.org/home
  • 21. Finding the FAANG Data http://data.faang.org/organism
  • 22. Finding the FAANG Data http://data.faang.org/organism/SAMEA103886117
  • 23. Finding the FAANG Data http://data.faang.org/specimen/SAMEA103886170
  • 24. Finding the FAANG Data •More Data • Additional FAANG data • Other livestock data using legacy standards •Standard Analysis products •Trackhub links •Better search •Sortable tables
  • 25. Who is helping you? Peter Harrison Jun Fan faang-dcc@ebi.ac.uk
  • 26. Overview 0% 20% 40% 60% 80% 100% Donor Specimen
  • 27. Questions? Find out how to submit data http://bit.ly/FAANGArchiveGuide Ask for help faang-dcc@ebi.ac.uk @faangomics on twitter Let us know about your project http://bit.ly/FAANGProjectRegistry

Notes de l'éditeur

  1. I have a good example from Tara Oceans of where metadata relating to samples allows image and sequence samples to be aligned and a close ecological relationship to be discovered between an alga and a diatom - essentially, hight-throughput sequence data showed more-than-expected co-location of two species, this led to paring down to a number of bodies of water in given locations (metadata), high-throughput image samples from the same bodies of water could then be selectively inspected to reveal how close the ecological relationship was.
  2. I have a good example from Tara Oceans of where metadata relating to samples allows image and sequence samples to be aligned and a close ecological relationship to be discovered between an alga and a diatom - essentially, hight-throughput sequence data showed more-than-expected co-location of two species, this led to paring down to a number of bodies of water in given locations (metadata), high-throughput image samples from the same bodies of water could then be selectively inspected to reveal how close the ecological relationship was.
  3. Why Improve your analysis Easier to find batch effects and confounding factors Make your data usable Reduce ambiguity Facilitate reproduction of results Improve integration across labs, projects and data modalities Make your data discoverable Other researchers Integration services (Ensembl, Gene Expression Atlas)
  4. Why Improve your analysis Easier to find batch effects and confounding factors Make your data usable Reduce ambiguity Facilitate reproduction of results Improve integration across labs, projects and data modalities Make your data discoverable Other researchers Integration services (Ensembl, Gene Expression Atlas)