1. How to find and provide FAANG
Data
The FAANG Data Coordination
Centre
Laura Clarke
Vertebrate Data Coordination
www.ebi.ac.uk
@laurastephen
2. Value of Metadata
Data Access
Metadata Standards
Validation tools
FAANG Data availability
Support
3. Tara Oceans
•2 ½ year expedition
•210 sampling stations
•Standardized measurements
•Genetic
•Morphological
•Physico-Chemical
Good metadata enables great science
5. HipSci
•750 iPSC lines
•Healthy and rare disease donors
•Extensive genomic and epigenomic characterization
•All lines and data available to community
Good metadata enables great science
6. H Kilpinen et al. Nature 546, 370–375 (2017) doi:10.1038/nature22403
Good metadata enables great science
7. The FAANG Data Coordination Centre
•Supporting Submission
•Ensuring high quality data description
•Making the data accessible
•Providing consistent analysis products
8. Findable
• Global persistent identifier
• Rich metadata
• Store metadata in
registries
Accessible
• Resolvable identifiers
• Metadata persists
• Machine and human
access
Interoperable
• Open data format
• Modelled with FAIR
compliant vocabularies
• Reference external data
Reusable
• Rich metadata
• Clear license
• Provenance
Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific
data management and stewardship Authors. Nature Scientific Data
3, 1–15 (2016). DOI: 10.1038/sdata.2016.18
Alasdair J G Gray
Heriot-Watt University
Ensuring the data is FAIR
9. • Needs
• Well structured
• Consistent naming
• Specific descriptions
• Enables
• Aggregation
• Integration
• Tracking
Good data is well described data
10. • Representation of important things in a specific domain
• Describes types of entities (e.g. cells) and relations between them
• An active, formal computational artifact
• A mathematical model based on a subset of first order logic
• Tools can automatically process ontologies for analysis - e.g. gene expression enrichment
analysis
• A communication tool
• Provides a dictionary for collaborators, a shared understanding
• Allows data sharing
Use Ontologies
Myeloid Leucocyte
Monocyte
CD14+ Monocyte
11. • OLS - The Ontology Lookup Service
• http://www.ebi.ac.uk/ols/index
• Indexes 150 biomedical ontologies
• (4.5 million terms, 11 million relations)
• Zooma
• http://www.ebi.ac.uk/spot/zooma/
• Using past knowledge to inform new annotation
• Curated mappings from the Expression Atlas, Open Targets and others
• Webulous
• http://www.ebi.ac.uk/spot/webulous/
• GoogleSheets template system
• Create new ontology terms
• OXO (in beta)
• http://www.ebi.ac.uk/spot/oxo/
• Cross references between ontologies
• All services have API and UI access
Webulous
Use Ontologies
12. Supporting deposition of well described data
FAANG Validation Service
Validates completed metadata Excel templates and
prepares metadata for archive submission
http://www.ebi.ac.uk/vg/faang
13. Supporting deposition of well described data
•Checks ontologies (scope, accuracy, terms).
•Relationships (familial, breeds).
•Minimum standards and validity.
16. Supporting deposition of well described data
•On conversion, validates
again and checks project
information.
•If passes, returns correctly
formatted SampleTab for
BioSamples and XML for
ENA.
17. Supporting deposition of well described data
• The Validation service code and website
• http://www.ebi.ac.uk/vg/faang
•https://github.com/faang/faang-metadata
• https://github.com/FAANG/validate-metadata
19. How much data?
132 62 56 14 13 8
678 2479 1423 1667 941
8 European Nucleotide Archive studies submitted
• 4891 sequencing runs
Largest submission
• RNA sequencing of tissues and cell types from Scottish
Blackface x Texel sheep for transcriptome annotation
and expression analysis, The Roslin Institute
• 3994 sequencing runs
22. Finding the FAANG Data
http://data.faang.org/organism/SAMEA103886117
23. Finding the FAANG Data
http://data.faang.org/specimen/SAMEA103886170
24. Finding the FAANG Data
•More Data
• Additional FAANG data
• Other livestock data using legacy standards
•Standard Analysis products
•Trackhub links
•Better search
•Sortable tables
25. Who is helping you?
Peter Harrison Jun Fan
faang-dcc@ebi.ac.uk
27. Questions?
Find out how to submit data
http://bit.ly/FAANGArchiveGuide
Ask for help
faang-dcc@ebi.ac.uk
@faangomics on twitter
Let us know about your project
http://bit.ly/FAANGProjectRegistry
Notes de l'éditeur
I have a good example from Tara Oceans of where metadata relating to samples allows image and sequence samples to be aligned and a close ecological relationship to be discovered between an alga and a diatom - essentially, hight-throughput sequence data showed more-than-expected co-location of two species, this led to paring down to a number of bodies of water in given locations (metadata), high-throughput image samples from the same bodies of water could then be selectively inspected to reveal how close the ecological relationship was.
I have a good example from Tara Oceans of where metadata relating to samples allows image and sequence samples to be aligned and a close ecological relationship to be discovered between an alga and a diatom - essentially, hight-throughput sequence data showed more-than-expected co-location of two species, this led to paring down to a number of bodies of water in given locations (metadata), high-throughput image samples from the same bodies of water could then be selectively inspected to reveal how close the ecological relationship was.
Why
Improve your analysis
Easier to find batch effects and confounding factors
Make your data usable
Reduce ambiguity
Facilitate reproduction of results
Improve integration across labs, projects and data modalities
Make your data discoverable
Other researchers
Integration services (Ensembl, Gene Expression Atlas)
Why
Improve your analysis
Easier to find batch effects and confounding factors
Make your data usable
Reduce ambiguity
Facilitate reproduction of results
Improve integration across labs, projects and data modalities
Make your data discoverable
Other researchers
Integration services (Ensembl, Gene Expression Atlas)