SlideShare une entreprise Scribd logo
1  sur  54
Genome sharing projects
around the world
– and how you find data for
your research
Fiona Nielsen
Lunteren, April 18 2016
Slides will be made available online 
Follow us on twitter:
@repositiveio
Fiona Nielsen, April 18 2016
Find me on twitter: @glyn_dk
1. What data are you looking for? And Why?
2. Data resources from around the world
3. Tips on how to find and access data
4. Hands-on using Repositive
5. Summary and feedback
Workshop outline
1. What data are you looking for?
This workshop will focus on finding
and accessing human genomic data.
… And why would you be looking for
genomic data for your research?
Are you researching cancer or
genetic diseases?
How much data do you need to publish a paper?
2001: 1 human genome
2012: 1000 Genomes (1092 genomes, since increased to ~2500)
2015:
UK10K, Icelandic population (2,636 + 100k imputed),
Cancer genome atlas ~11,000 genomes
Exac consortium 65,000 exomes
?
Statistically speaking, you still need 10s of thousands of samples for
validation
The more severe the phenotype and the more complete penetrance, the
easier it will be for you to find your variant, but
“As the genetic complexity of the disease increases (for example,
reduced penetrance and increased locus heterogeneity), issues of
statistical power quickly become paramount.”
http://www.nature.com/nrg/journal/v15/n5/full/nrg3706.html
But I am just looking at this one disease…
What can I do?
PRO TIP: involve a statistician early on in your study design!
How can I determine significance?
“One potentially powerful approach is to assess conservation across and within
multiple species as whole-genome sequence data become more abundant.”
Look at extreme phenotypes “Sampling cases or controls from the extremes of an
appropriate quantitative distribution can often increase power”
Look at non-SNP variants, they are more likely to have functional effects
- “how to account for the technical features of sequencing, such as incomplete
sequencing and biased coverage over the genome?”
Think of how you can provide evidence that your result is not just a local
technical variation or sampling bias
e.g. data from same cell type, same seq technology, same alignment…
How to account for bias?
PRO TIP: include more reference data in your analysis
• Know what data is available in your lab,
your dept, your org
• Survey from Qiagen showed that one of
the main reasons researchers collaborate
is to get access to data!
How can I access more data for my research?
How can I find collaborators?
PRO TIP: Search for collaborators who have the data you need
PRO TIP: Tell your colleagues and peers what type of data you
have in your lab
2. Data resources from around the world
public repositories
• some you apply for access,
especially if data contains
clinical info or whole genome
PID
• some are open access: GEO,
SRA, PGP, OpenSNP, GigaDB, …
• some are consented for
general research use, some
have specific consent
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Large amounts of data, but not accessible
≈ .5PB
Sequence
available
80+PB
Sequenced
every year
WGS data available
in public repos
Exponential
growth rate
Under-utilised data
has huge potential for
medical research
DATA is fragmented
It may be confusing
Hundreds of data sources
…but they aren’t easy to find!
10
25
33 35
102
163
0
20
40
60
80
100
120
140
160
180
200
Jan-15 Mar-15 Jun-15 Sep-15 Dec-15 Mar-16
http://dx.doi.org/10.1371/journal.pbio.1002418First 30 data sources listed here:
Data source content
Assay Types
Dedicated to…
Number of samples in Data sources
1
10
100
1000
10000
100000
1000000
Sample#(Log10)
Top 5:
GEO (1.8M)
PMI Cohort Program (1M)
Auria Biopankki (1M)
EGA (~0.6M)
SRA (~0.5M)
Data accessibility
Can download the
data straight away
or after logging in.
Need to apply for
access to the data.
Has both Open and Restricted
access data within one repository.
Online Data source ’types’
University – Affiliated to a
university. Often only members of
that university can
upload/download to/from it. Catalogue – doesn’t have raw
data but lists studies/datasets.
Initiative/Consortium – Has a
specific purpose/aim. Often
focussed on a question or
disease.
Repository – Can download
from, has data from multiple
institutions. Often can also
upload your own data there.
Company – For profit
organisation. Listing data is
not their main purpose.
Biobank – many have sequence
data of their biological samples.
Sequenced ethnicities
Aboriginals
African Americans
Africans
Australians
Chinese
Malays
Indians
Danish
Dutch Estonian
Russian
European Ancestry
Finnish
Icelandic
Japanese
Korean
Latin Americans
Saudi
Swedish
Machines & Data sources
947
5600
88
660
26
68
50
62
3
25
0
0
23 International
Interesting site to look at:
http://omicsmaps.com/stats
Main Repository funders
BGI = 4
EBI = 9NIH = 10
NCBI = 9
The Broad = 8
Wellcome = 4
EBI total 104 services, 19 repositories http://www.ebi.ac.uk/services/all
NCBI total 67 databases http://www.ncbi.nlm.nih.gov/guide/all/#databases_
• Case study: DNA data on Cancer
3. Tips to find and access data
Case Study – DNA data on Cancer
Repositories you
have heard of:
Ask around
(word of mouth):
Repository Data Type Access
ArrayExpress Expression Open
GEO Espression Open
EGA Mixed Restricted
dbGaP Mixed Restricted
Encode Healthy Reference Open
1000 Genomes Healthy Reference Open
Repository Data Type Access
COSMIC Somatic mutations & WGS Open
ClinVar Variant information Open
ExAC Allele Freq. but not raw data Open
SRA Individual sequences Open
TCGA Clinical & high level data Open
CGHub Low level data (DNA data) Restricted
Case Study – DNA data on Cancer
We have identified the first 27 cancer specific data sources 
And many more that contain cancer data alongside other data
types.
Abcodia
AmbryShare
BRCA Exchange
Breast Cancer Now Tissue Bank
Broad Cancer programme datasets
Cancer Moonshot 2020
CanGEM
CGCI
CGHub
Chinese cancer genome consortium
Chinese national human genome centre
Follicular Lymphoma Genome Data
G-DOC
GenoMel
ICGC
National Mesothelioma Virtual Bank
NCIP Hub
Project GENIE
Target
TCGA
Texa cancer research biobank
NCI-60
CCLE
COSMIC
Fantom
cancer methylome system
Cancer therepeutics response portal
1. Register for eRA account
2. Request access to specific dataset of interest
3. Download data
Registering for CGHub
https://cghub.ucsc.edu/keyfile/newuser.html
‘Principle signing
official’ registers
Email to verify
Email to
confirm/deny access
to website
Email with
temporary password
Change password Electronic signature
Login Fill in contact info,
Complete ‘424’ form
(research application
form)
Request reviewed by
DAC
Email to
confirm/deny access
to data
Login
Retrieve personal
access token
Download! 
Often a long process
Bottlenecks:
• Finding relevant and usable
data
• Getting authorisation to
access data
• Formatting data
• Storing and moving data
We studied the problem by
qualitative interviews followed
by a survey of researchers in
human genetics
Often a long process
T. A. van Schaik et al
The need to redefine genomic
data sharing: a focus on data
accessibility, Applied &
Translational Genomics, 2014
10.1016/j.atg.2014.09.013
Researchers spend months to
find and access genomic data,
and often choose to not access
data at all
Why the barrier?
Why the barrier?
• Benefits: strict governance, review of consent, applicant signs for full
responsibility for governance
• Disadvantages: No control of data once access is given, high barrier for
access – too high?
• Start planning your data needs early in your project
• When you find the data you need, start application
• Use Open Access data
How can I save time?
PRO Tip: If you use human genomic data, apply for the GRU
datasets in dbGaP, one application – access to all the GRU
datasets
• Some data is Open Access  requires specific consent
• OpenSNP.org (Bastian)
• Personal Genomes Projects
• Individuals who put their genomes online, e.g. Manuel Corpas
and his family “the Corpasome”
• http://manuelcorpas.com/about/
Not all data is restricted
• Some data is Open Access  requires specific consent
• Individuals who put their genomes online, e.g. Manuel Corpas
and his family “the Corpasome”
• http://manuelcorpas.com/about/
• OpenSNP.org
• Personal Genomes Projects
Not all data is restricted
Personal Genome Project
PGP Harvard PGP Canada PGP UK Genom Austria
Host institution Harvard Medical School
Boston
SickKids Toronto University College London CeMM Research Center for
Molecular Medicine
Principal Investigator George Church Steven Scherer Stephan Beck Christoph Bock & Giulio
Superti-Furga
Launch year 2005 2012 2013 2014
Geographic scope USA, mainly Boston Canada United Kingdom Mainly Austria
Enrollment eligibility At least 18 years old, able to make an informed decision, perfect score in the PGP enrollment exam, certain vulnerable groups
excluded
Data Generated Whole genome sequencing,
upload of additional data
possible
Mainly whole genome
sequencing
Whole genome sequencing,
DNA methylome sequencing,
RNA transcriptome sequencing
Mainly whole genome
sequencing
Number of genomes 100s 10s 10s 10s
Data access
http://personalgenomes.org/harvard/data
http://genomaustria.at/unser-
genom/#genome-der-
pionierinnen
Project funding Discretional funds and
corporate sponsoring
Institutional startup funds Discretional funds and
corporate sponsoring
Institutional startup funds
Areas of emphasis Integration with phenotypic data,
collaboration with other personal
omics initiatives
Genome donations, synergy with
massive-scale clinical genome
sequencing projects
Genomes and society, genetic
literacy, school projects,
education
Website http://personalgenomes.org/harvard/ http://personalgenomes.org/canada/ http://personalgenomes.org/uk/ http://genomaustria.at/
Summary of data access barriers
Data is uploaded
to repository
Data is discovered
by potential user
Data is accessed
by potential user
• “even when researchers are authorised to share data they
report reluctance to do so because of the amount of effort
required“ http://www.sciencedirect.com/science/article/pii/S2212066114000386
• “Clinical geneticists cited a lack of time because their main priority is
diagnosing patients. Industrial researchers cited a lack of time because of
the pressure to meet the deadlines in their job. Researchers in academia
cited both a concern about the potential loss of future publications once
unpublished data is shared, and the lack of time and incentive to share
data as this does not contribute to their publication record. Researchers
from all categories felt that they lacked sufficient resources to make their
data available.”
The barrier of making data available
But I do not want to share my data
• If you expect data to be available to you
– you have to make your data available too!
• Encourage collaborations: power by numbers
1. Get credit – publish and make your data available
2. Give credit – cite data sources
3. Understand consent – for all uses of clinical data
Best practices
• Use all available tools to make your life easier:
• Data publications  visibility and citations for your data, e.g.
GigaScience and Scientific Data
• Figshare, Zenodo, Dryad for sharing open access data
• PhenomeCentral, Matchmaker exchange for rare disease research
• Repositive for finding data across repositories and make your own
data discoverable
Best practices: use the tools
Does data sharing
matter at
grant proposal evaluation
Based on: Winning Horizon 2020 with Open Science,
http://dx.doi.org/10.5281/zenodo.12247
Best practices: Plan into your grant proposals
“Weakness: Involvement of non-
academic beneficiaries is limited”
“Weakness: highly focused on academic activities, and
lacks an advanced communication strategy”
“Weakness: limited exposure to
non-academic partners & infrastructures”
Excellence
Impact
Implementation
“data accessibility is unclear!”
“data storage & access not considered”
Best practices: Plan into your grant proposals
“Strengths: extensive dissemination of data to the
scientific community (open access, databases)”
“outreach activities to a broad audience”
“research software is freely available”
Impact:
Best practices: Plan into your grant proposals
Best practices: Plan into your grant proposals
Make the (research) world a better place by sharing in return 
Best practices: Share in return!
• Digital consent: towards automatic processing of applications
• Dynamic consent and power to the patient, e.g.
PatientsKnowBest
• Privacy-preserving access to datasets: preserving control and
governance with data custodian, lower barrier for access
What the future holds
4. Hands-on session using Repositive
What if finding data was as easy as finding a book on
Amazon, book a hotel on Expedia?
Repositive promotes best practices
Discover new data sources
EASY
SEARCH
Repositive promotes best practices
Make your data visible
SHARE
KNOWLEDGE
Repositive promotes best practices
Build a data community
BUILD
TRUST
Benefit for both sides of data collaboration
Data consumers Data producers
Find relevant data faster
Feedback from other users
through ratings and comments to
evaluate data quality
Find collaborators with data
Make your data visible
Build credibility as a trusted
provider of quality data
Find collaborators to analyse
your data
Live demo
http://discover.repositive.io
Use activation code: BioBS16
5. Summary and feedback
• Get credit – publish data
• Give credit – cite data
• Understand consent
Tell us your thoughts:
@repositiveio
@glyn_dk
And read more on http://repositive.io
Thank you!

Contenu connexe

Tendances

Increasing transparency in Medical Education through Open Data
Increasing transparency in Medical Education through Open Data Increasing transparency in Medical Education through Open Data
Increasing transparency in Medical Education through Open Data Rebecca Grant
 
Managing Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research MethodsManaging Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research MethodsRebecca Grant
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...GigaScience, BGI Hong Kong
 
RARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsCarole Goble
 
SPARC 2013 Data Management Presentation
SPARC 2013 Data Management Presentation SPARC 2013 Data Management Presentation
SPARC 2013 Data Management Presentation Jackie Wirz, PhD
 
Data for AI models, the past, the present, the future
Data for AI models, the past, the present, the futureData for AI models, the past, the present, the future
Data for AI models, the past, the present, the futurePistoia Alliance
 
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Susanna-Assunta Sansone
 
Research in the time of Covid: Surveying impacts on Early Career Researchers
Research in the time of Covid: Surveying impacts on Early Career ResearchersResearch in the time of Covid: Surveying impacts on Early Career Researchers
Research in the time of Covid: Surveying impacts on Early Career ResearchersRebecca Grant
 
Do Open data badges influence author behaviour? A case study at Springer Nature
Do Open data badges influence author behaviour? A case study at Springer NatureDo Open data badges influence author behaviour? A case study at Springer Nature
Do Open data badges influence author behaviour? A case study at Springer NatureRebecca Grant
 
ResearchGate - How do 'Social Networks for Scientists' Affect Libraries?
ResearchGate - How do 'Social Networks for Scientists' Affect Libraries?ResearchGate - How do 'Social Networks for Scientists' Affect Libraries?
ResearchGate - How do 'Social Networks for Scientists' Affect Libraries?Keita Bando
 
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...Jonathan Tedds
 
CINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIRCINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIRCINECAProject
 
Coping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCoping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCarly Strasser
 
Empowering Data in Scholarly Publishing
Empowering Data in Scholarly PublishingEmpowering Data in Scholarly Publishing
Empowering Data in Scholarly PublishingCharleston Conference
 
AgriFood Data, Models, Standards, Tools, Use Cases
AgriFood Data, Models, Standards, Tools, Use CasesAgriFood Data, Models, Standards, Tools, Use Cases
AgriFood Data, Models, Standards, Tools, Use CasesRothamsted Research, UK
 
CINECA webinar slides: Open science through fair health data networks dream o...
CINECA webinar slides: Open science through fair health data networks dream o...CINECA webinar slides: Open science through fair health data networks dream o...
CINECA webinar slides: Open science through fair health data networks dream o...CINECAProject
 
The MIAPA ontology: An annotation ontology for validating minimum metadata re...
The MIAPA ontology: An annotation ontology for validating minimum metadata re...The MIAPA ontology: An annotation ontology for validating minimum metadata re...
The MIAPA ontology: An annotation ontology for validating minimum metadata re...Hilmar Lapp
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicinePaul Groth
 

Tendances (20)

Increasing transparency in Medical Education through Open Data
Increasing transparency in Medical Education through Open Data Increasing transparency in Medical Education through Open Data
Increasing transparency in Medical Education through Open Data
 
Working with Quertle
Working with QuertleWorking with Quertle
Working with Quertle
 
Managing Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research MethodsManaging Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research Methods
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
 
RARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research Objects
 
SPARC 2013 Data Management Presentation
SPARC 2013 Data Management Presentation SPARC 2013 Data Management Presentation
SPARC 2013 Data Management Presentation
 
Data for AI models, the past, the present, the future
Data for AI models, the past, the present, the futureData for AI models, the past, the present, the future
Data for AI models, the past, the present, the future
 
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
 
Clinical Anatomy 9566
Clinical Anatomy 9566Clinical Anatomy 9566
Clinical Anatomy 9566
 
Research in the time of Covid: Surveying impacts on Early Career Researchers
Research in the time of Covid: Surveying impacts on Early Career ResearchersResearch in the time of Covid: Surveying impacts on Early Career Researchers
Research in the time of Covid: Surveying impacts on Early Career Researchers
 
Do Open data badges influence author behaviour? A case study at Springer Nature
Do Open data badges influence author behaviour? A case study at Springer NatureDo Open data badges influence author behaviour? A case study at Springer Nature
Do Open data badges influence author behaviour? A case study at Springer Nature
 
ResearchGate - How do 'Social Networks for Scientists' Affect Libraries?
ResearchGate - How do 'Social Networks for Scientists' Affect Libraries?ResearchGate - How do 'Social Networks for Scientists' Affect Libraries?
ResearchGate - How do 'Social Networks for Scientists' Affect Libraries?
 
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
 
CINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIRCINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIR
 
Coping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCoping with Data for WHOI JP Students
Coping with Data for WHOI JP Students
 
Empowering Data in Scholarly Publishing
Empowering Data in Scholarly PublishingEmpowering Data in Scholarly Publishing
Empowering Data in Scholarly Publishing
 
AgriFood Data, Models, Standards, Tools, Use Cases
AgriFood Data, Models, Standards, Tools, Use CasesAgriFood Data, Models, Standards, Tools, Use Cases
AgriFood Data, Models, Standards, Tools, Use Cases
 
CINECA webinar slides: Open science through fair health data networks dream o...
CINECA webinar slides: Open science through fair health data networks dream o...CINECA webinar slides: Open science through fair health data networks dream o...
CINECA webinar slides: Open science through fair health data networks dream o...
 
The MIAPA ontology: An annotation ontology for validating minimum metadata re...
The MIAPA ontology: An annotation ontology for validating minimum metadata re...The MIAPA ontology: An annotation ontology for validating minimum metadata re...
The MIAPA ontology: An annotation ontology for validating minimum metadata re...
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicine
 

Similaire à Genome sharing projects around the world and how to find data for your research

Data dialogue - Human Genomic Data Discovery
Data dialogue - Human Genomic Data DiscoveryData dialogue - Human Genomic Data Discovery
Data dialogue - Human Genomic Data DiscoveryFiona Nielsen
 
SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...Fiona Nielsen
 
International perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research dataInternational perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research dataARDC
 
Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?Philip Bourne
 
Pistoia Alliance European Conference 2015 - Julia Wilson / Global Alliance fo...
Pistoia Alliance European Conference 2015 - Julia Wilson / Global Alliance fo...Pistoia Alliance European Conference 2015 - Julia Wilson / Global Alliance fo...
Pistoia Alliance European Conference 2015 - Julia Wilson / Global Alliance fo...Pistoia Alliance
 
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Barry Smith
 
Public Databases for Radiomics Research: Current Status and Future Directions
Public Databases for Radiomics Research: Current Status and Future DirectionsPublic Databases for Radiomics Research: Current Status and Future Directions
Public Databases for Radiomics Research: Current Status and Future DirectionsCancerImagingInforma
 
EuroBioForum2014_sepaker_Palotie
EuroBioForum2014_sepaker_PalotieEuroBioForum2014_sepaker_Palotie
EuroBioForum2014_sepaker_PalotieEuroBioForum
 
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...William Hsiao
 
Life sciences big data use cases
Life sciences big data use casesLife sciences big data use cases
Life sciences big data use casesGuy Coates
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Robert Grossman
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forumChris Dwan
 
Fore FAIR ISMB 2019
Fore FAIR ISMB 2019Fore FAIR ISMB 2019
Fore FAIR ISMB 2019Ian Fore
 
The Human Variome Database in Australia in 2014 - Graham Taylor
The Human Variome Database in Australia in 2014 - Graham TaylorThe Human Variome Database in Australia in 2014 - Graham Taylor
The Human Variome Database in Australia in 2014 - Graham TaylorHuman Variome Project
 
Grand round whsiao_may2015
Grand round whsiao_may2015Grand round whsiao_may2015
Grand round whsiao_may2015IRIDA_community
 
How Can We Make Genomic Epidemiology a Widespread Reality? - William Hsiao
How Can We Make Genomic Epidemiology a Widespread Reality?  - William HsiaoHow Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao
How Can We Make Genomic Epidemiology a Widespread Reality? - William HsiaoWilliam Hsiao
 
Addgene - Canton Nucleic Acids Forum 2015
Addgene - Canton Nucleic Acids Forum 2015Addgene - Canton Nucleic Acids Forum 2015
Addgene - Canton Nucleic Acids Forum 2015Joanne Kamens, PhD
 

Similaire à Genome sharing projects around the world and how to find data for your research (20)

Data dialogue - Human Genomic Data Discovery
Data dialogue - Human Genomic Data DiscoveryData dialogue - Human Genomic Data Discovery
Data dialogue - Human Genomic Data Discovery
 
SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...
 
Nov 2014 ouellette_windsor_icgc_final
Nov 2014 ouellette_windsor_icgc_finalNov 2014 ouellette_windsor_icgc_final
Nov 2014 ouellette_windsor_icgc_final
 
International perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research dataInternational perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research data
 
Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?
 
Pistoia Alliance European Conference 2015 - Julia Wilson / Global Alliance fo...
Pistoia Alliance European Conference 2015 - Julia Wilson / Global Alliance fo...Pistoia Alliance European Conference 2015 - Julia Wilson / Global Alliance fo...
Pistoia Alliance European Conference 2015 - Julia Wilson / Global Alliance fo...
 
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
 
Public Databases for Radiomics Research: Current Status and Future Directions
Public Databases for Radiomics Research: Current Status and Future DirectionsPublic Databases for Radiomics Research: Current Status and Future Directions
Public Databases for Radiomics Research: Current Status and Future Directions
 
EuroBioForum2014_sepaker_Palotie
EuroBioForum2014_sepaker_PalotieEuroBioForum2014_sepaker_Palotie
EuroBioForum2014_sepaker_Palotie
 
Open data genomics_palermo_2017_ver03
Open data genomics_palermo_2017_ver03Open data genomics_palermo_2017_ver03
Open data genomics_palermo_2017_ver03
 
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...
 
Life sciences big data use cases
Life sciences big data use casesLife sciences big data use cases
Life sciences big data use cases
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forum
 
Fore FAIR ISMB 2019
Fore FAIR ISMB 2019Fore FAIR ISMB 2019
Fore FAIR ISMB 2019
 
The Human Variome Database in Australia in 2014 - Graham Taylor
The Human Variome Database in Australia in 2014 - Graham TaylorThe Human Variome Database in Australia in 2014 - Graham Taylor
The Human Variome Database in Australia in 2014 - Graham Taylor
 
Grand round whsiao_may2015
Grand round whsiao_may2015Grand round whsiao_may2015
Grand round whsiao_may2015
 
How Can We Make Genomic Epidemiology a Widespread Reality? - William Hsiao
How Can We Make Genomic Epidemiology a Widespread Reality?  - William HsiaoHow Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao
How Can We Make Genomic Epidemiology a Widespread Reality? - William Hsiao
 
Big data sharing
Big data sharingBig data sharing
Big data sharing
 
Addgene - Canton Nucleic Acids Forum 2015
Addgene - Canton Nucleic Acids Forum 2015Addgene - Canton Nucleic Acids Forum 2015
Addgene - Canton Nucleic Acids Forum 2015
 

Plus de Fiona Nielsen

EICT Summer School August 2023 - Things I never knew I never knew - about bu...
EICT Summer School August 2023 - Things I never knew  I never knew - about bu...EICT Summer School August 2023 - Things I never knew  I never knew - about bu...
EICT Summer School August 2023 - Things I never knew I never knew - about bu...Fiona Nielsen
 
Challenges with pre-clinical studies in immuno oncology - by Fiona Nielsen
Challenges with pre-clinical studies in immuno oncology - by Fiona NielsenChallenges with pre-clinical studies in immuno oncology - by Fiona Nielsen
Challenges with pre-clinical studies in immuno oncology - by Fiona NielsenFiona Nielsen
 
AIDR2019 - standards - tools - incentives - what does it take to enable data ...
AIDR2019 - standards - tools - incentives - what does it take to enable data ...AIDR2019 - standards - tools - incentives - what does it take to enable data ...
AIDR2019 - standards - tools - incentives - what does it take to enable data ...Fiona Nielsen
 
Genomics for the public is coming - are you ready or not?
Genomics for the public is coming - are you ready or not?Genomics for the public is coming - are you ready or not?
Genomics for the public is coming - are you ready or not?Fiona Nielsen
 
Investing in innovation for genomic medicine - sept 5 2017
Investing in innovation for genomic medicine - sept 5 2017Investing in innovation for genomic medicine - sept 5 2017
Investing in innovation for genomic medicine - sept 5 2017Fiona Nielsen
 
Investing in innovation for genomic medicine - the journey of Repositive
Investing in innovation for genomic medicine - the journey of RepositiveInvesting in innovation for genomic medicine - the journey of Repositive
Investing in innovation for genomic medicine - the journey of RepositiveFiona Nielsen
 
From bioinformatics scientist to entrepreneur - Women in Omics - ICG11 - 2016
From bioinformatics scientist to entrepreneur - Women in Omics - ICG11 - 2016From bioinformatics scientist to entrepreneur - Women in Omics - ICG11 - 2016
From bioinformatics scientist to entrepreneur - Women in Omics - ICG11 - 2016Fiona Nielsen
 
ICG-11 - genomic data projects around the world - nov 5 2016
ICG-11 - genomic data projects around the world - nov 5 2016ICG-11 - genomic data projects around the world - nov 5 2016
ICG-11 - genomic data projects around the world - nov 5 2016Fiona Nielsen
 
Genome sharing projects around the world - Open Access is not enough
Genome sharing projects around the world - Open Access is not enough Genome sharing projects around the world - Open Access is not enough
Genome sharing projects around the world - Open Access is not enough Fiona Nielsen
 
Session 3 - big (biomedical) data
Session 3 - big (biomedical) dataSession 3 - big (biomedical) data
Session 3 - big (biomedical) dataFiona Nielsen
 
The need to redefine genomic data sharing - moving towards Open Science Oct ...
The need to redefine genomic data sharing - moving towards Open Science  Oct ...The need to redefine genomic data sharing - moving towards Open Science  Oct ...
The need to redefine genomic data sharing - moving towards Open Science Oct ...Fiona Nielsen
 
DNAdigest Eagle Genomics Symposium March 27, 2014
DNAdigest Eagle Genomics Symposium March 27, 2014DNAdigest Eagle Genomics Symposium March 27, 2014
DNAdigest Eagle Genomics Symposium March 27, 2014Fiona Nielsen
 

Plus de Fiona Nielsen (12)

EICT Summer School August 2023 - Things I never knew I never knew - about bu...
EICT Summer School August 2023 - Things I never knew  I never knew - about bu...EICT Summer School August 2023 - Things I never knew  I never knew - about bu...
EICT Summer School August 2023 - Things I never knew I never knew - about bu...
 
Challenges with pre-clinical studies in immuno oncology - by Fiona Nielsen
Challenges with pre-clinical studies in immuno oncology - by Fiona NielsenChallenges with pre-clinical studies in immuno oncology - by Fiona Nielsen
Challenges with pre-clinical studies in immuno oncology - by Fiona Nielsen
 
AIDR2019 - standards - tools - incentives - what does it take to enable data ...
AIDR2019 - standards - tools - incentives - what does it take to enable data ...AIDR2019 - standards - tools - incentives - what does it take to enable data ...
AIDR2019 - standards - tools - incentives - what does it take to enable data ...
 
Genomics for the public is coming - are you ready or not?
Genomics for the public is coming - are you ready or not?Genomics for the public is coming - are you ready or not?
Genomics for the public is coming - are you ready or not?
 
Investing in innovation for genomic medicine - sept 5 2017
Investing in innovation for genomic medicine - sept 5 2017Investing in innovation for genomic medicine - sept 5 2017
Investing in innovation for genomic medicine - sept 5 2017
 
Investing in innovation for genomic medicine - the journey of Repositive
Investing in innovation for genomic medicine - the journey of RepositiveInvesting in innovation for genomic medicine - the journey of Repositive
Investing in innovation for genomic medicine - the journey of Repositive
 
From bioinformatics scientist to entrepreneur - Women in Omics - ICG11 - 2016
From bioinformatics scientist to entrepreneur - Women in Omics - ICG11 - 2016From bioinformatics scientist to entrepreneur - Women in Omics - ICG11 - 2016
From bioinformatics scientist to entrepreneur - Women in Omics - ICG11 - 2016
 
ICG-11 - genomic data projects around the world - nov 5 2016
ICG-11 - genomic data projects around the world - nov 5 2016ICG-11 - genomic data projects around the world - nov 5 2016
ICG-11 - genomic data projects around the world - nov 5 2016
 
Genome sharing projects around the world - Open Access is not enough
Genome sharing projects around the world - Open Access is not enough Genome sharing projects around the world - Open Access is not enough
Genome sharing projects around the world - Open Access is not enough
 
Session 3 - big (biomedical) data
Session 3 - big (biomedical) dataSession 3 - big (biomedical) data
Session 3 - big (biomedical) data
 
The need to redefine genomic data sharing - moving towards Open Science Oct ...
The need to redefine genomic data sharing - moving towards Open Science  Oct ...The need to redefine genomic data sharing - moving towards Open Science  Oct ...
The need to redefine genomic data sharing - moving towards Open Science Oct ...
 
DNAdigest Eagle Genomics Symposium March 27, 2014
DNAdigest Eagle Genomics Symposium March 27, 2014DNAdigest Eagle Genomics Symposium March 27, 2014
DNAdigest Eagle Genomics Symposium March 27, 2014
 

Dernier

GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
The Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravityThe Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravitySubhadipsau21168
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Jshifa
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 

Dernier (20)

GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
The Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravityThe Black hole shadow in Modified Gravity
The Black hole shadow in Modified Gravity
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 

Genome sharing projects around the world and how to find data for your research

  • 1. Genome sharing projects around the world – and how you find data for your research Fiona Nielsen Lunteren, April 18 2016 Slides will be made available online 
  • 2. Follow us on twitter: @repositiveio Fiona Nielsen, April 18 2016 Find me on twitter: @glyn_dk
  • 3. 1. What data are you looking for? And Why? 2. Data resources from around the world 3. Tips on how to find and access data 4. Hands-on using Repositive 5. Summary and feedback Workshop outline
  • 4. 1. What data are you looking for? This workshop will focus on finding and accessing human genomic data. … And why would you be looking for genomic data for your research? Are you researching cancer or genetic diseases?
  • 5. How much data do you need to publish a paper? 2001: 1 human genome 2012: 1000 Genomes (1092 genomes, since increased to ~2500) 2015: UK10K, Icelandic population (2,636 + 100k imputed), Cancer genome atlas ~11,000 genomes Exac consortium 65,000 exomes ?
  • 6. Statistically speaking, you still need 10s of thousands of samples for validation The more severe the phenotype and the more complete penetrance, the easier it will be for you to find your variant, but “As the genetic complexity of the disease increases (for example, reduced penetrance and increased locus heterogeneity), issues of statistical power quickly become paramount.” http://www.nature.com/nrg/journal/v15/n5/full/nrg3706.html But I am just looking at this one disease…
  • 7. What can I do? PRO TIP: involve a statistician early on in your study design!
  • 8. How can I determine significance? “One potentially powerful approach is to assess conservation across and within multiple species as whole-genome sequence data become more abundant.” Look at extreme phenotypes “Sampling cases or controls from the extremes of an appropriate quantitative distribution can often increase power” Look at non-SNP variants, they are more likely to have functional effects - “how to account for the technical features of sequencing, such as incomplete sequencing and biased coverage over the genome?”
  • 9. Think of how you can provide evidence that your result is not just a local technical variation or sampling bias e.g. data from same cell type, same seq technology, same alignment… How to account for bias? PRO TIP: include more reference data in your analysis
  • 10. • Know what data is available in your lab, your dept, your org • Survey from Qiagen showed that one of the main reasons researchers collaborate is to get access to data! How can I access more data for my research?
  • 11. How can I find collaborators? PRO TIP: Search for collaborators who have the data you need PRO TIP: Tell your colleagues and peers what type of data you have in your lab
  • 12. 2. Data resources from around the world public repositories • some you apply for access, especially if data contains clinical info or whole genome PID • some are open access: GEO, SRA, PGP, OpenSNP, GigaDB, … • some are consented for general research use, some have specific consent
  • 13. 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 Large amounts of data, but not accessible ≈ .5PB Sequence available 80+PB Sequenced every year WGS data available in public repos Exponential growth rate Under-utilised data has huge potential for medical research
  • 15. It may be confusing
  • 16. Hundreds of data sources …but they aren’t easy to find! 10 25 33 35 102 163 0 20 40 60 80 100 120 140 160 180 200 Jan-15 Mar-15 Jun-15 Sep-15 Dec-15 Mar-16 http://dx.doi.org/10.1371/journal.pbio.1002418First 30 data sources listed here:
  • 17. Data source content Assay Types Dedicated to…
  • 18. Number of samples in Data sources 1 10 100 1000 10000 100000 1000000 Sample#(Log10) Top 5: GEO (1.8M) PMI Cohort Program (1M) Auria Biopankki (1M) EGA (~0.6M) SRA (~0.5M)
  • 19. Data accessibility Can download the data straight away or after logging in. Need to apply for access to the data. Has both Open and Restricted access data within one repository.
  • 20. Online Data source ’types’ University – Affiliated to a university. Often only members of that university can upload/download to/from it. Catalogue – doesn’t have raw data but lists studies/datasets. Initiative/Consortium – Has a specific purpose/aim. Often focussed on a question or disease. Repository – Can download from, has data from multiple institutions. Often can also upload your own data there. Company – For profit organisation. Listing data is not their main purpose. Biobank – many have sequence data of their biological samples.
  • 21. Sequenced ethnicities Aboriginals African Americans Africans Australians Chinese Malays Indians Danish Dutch Estonian Russian European Ancestry Finnish Icelandic Japanese Korean Latin Americans Saudi Swedish
  • 22. Machines & Data sources 947 5600 88 660 26 68 50 62 3 25 0 0 23 International Interesting site to look at: http://omicsmaps.com/stats
  • 23. Main Repository funders BGI = 4 EBI = 9NIH = 10 NCBI = 9 The Broad = 8 Wellcome = 4 EBI total 104 services, 19 repositories http://www.ebi.ac.uk/services/all NCBI total 67 databases http://www.ncbi.nlm.nih.gov/guide/all/#databases_
  • 24. • Case study: DNA data on Cancer 3. Tips to find and access data
  • 25. Case Study – DNA data on Cancer Repositories you have heard of: Ask around (word of mouth): Repository Data Type Access ArrayExpress Expression Open GEO Espression Open EGA Mixed Restricted dbGaP Mixed Restricted Encode Healthy Reference Open 1000 Genomes Healthy Reference Open Repository Data Type Access COSMIC Somatic mutations & WGS Open ClinVar Variant information Open ExAC Allele Freq. but not raw data Open SRA Individual sequences Open TCGA Clinical & high level data Open CGHub Low level data (DNA data) Restricted
  • 26. Case Study – DNA data on Cancer We have identified the first 27 cancer specific data sources  And many more that contain cancer data alongside other data types. Abcodia AmbryShare BRCA Exchange Breast Cancer Now Tissue Bank Broad Cancer programme datasets Cancer Moonshot 2020 CanGEM CGCI CGHub Chinese cancer genome consortium Chinese national human genome centre Follicular Lymphoma Genome Data G-DOC GenoMel ICGC National Mesothelioma Virtual Bank NCIP Hub Project GENIE Target TCGA Texa cancer research biobank NCI-60 CCLE COSMIC Fantom cancer methylome system Cancer therepeutics response portal
  • 27. 1. Register for eRA account 2. Request access to specific dataset of interest 3. Download data Registering for CGHub https://cghub.ucsc.edu/keyfile/newuser.html ‘Principle signing official’ registers Email to verify Email to confirm/deny access to website Email with temporary password Change password Electronic signature Login Fill in contact info, Complete ‘424’ form (research application form) Request reviewed by DAC Email to confirm/deny access to data Login Retrieve personal access token Download! 
  • 28. Often a long process Bottlenecks: • Finding relevant and usable data • Getting authorisation to access data • Formatting data • Storing and moving data We studied the problem by qualitative interviews followed by a survey of researchers in human genetics
  • 29. Often a long process T. A. van Schaik et al The need to redefine genomic data sharing: a focus on data accessibility, Applied & Translational Genomics, 2014 10.1016/j.atg.2014.09.013 Researchers spend months to find and access genomic data, and often choose to not access data at all
  • 31. Why the barrier? • Benefits: strict governance, review of consent, applicant signs for full responsibility for governance • Disadvantages: No control of data once access is given, high barrier for access – too high?
  • 32. • Start planning your data needs early in your project • When you find the data you need, start application • Use Open Access data How can I save time? PRO Tip: If you use human genomic data, apply for the GRU datasets in dbGaP, one application – access to all the GRU datasets
  • 33. • Some data is Open Access  requires specific consent • OpenSNP.org (Bastian) • Personal Genomes Projects • Individuals who put their genomes online, e.g. Manuel Corpas and his family “the Corpasome” • http://manuelcorpas.com/about/ Not all data is restricted
  • 34. • Some data is Open Access  requires specific consent • Individuals who put their genomes online, e.g. Manuel Corpas and his family “the Corpasome” • http://manuelcorpas.com/about/ • OpenSNP.org • Personal Genomes Projects Not all data is restricted
  • 35. Personal Genome Project PGP Harvard PGP Canada PGP UK Genom Austria Host institution Harvard Medical School Boston SickKids Toronto University College London CeMM Research Center for Molecular Medicine Principal Investigator George Church Steven Scherer Stephan Beck Christoph Bock & Giulio Superti-Furga Launch year 2005 2012 2013 2014 Geographic scope USA, mainly Boston Canada United Kingdom Mainly Austria Enrollment eligibility At least 18 years old, able to make an informed decision, perfect score in the PGP enrollment exam, certain vulnerable groups excluded Data Generated Whole genome sequencing, upload of additional data possible Mainly whole genome sequencing Whole genome sequencing, DNA methylome sequencing, RNA transcriptome sequencing Mainly whole genome sequencing Number of genomes 100s 10s 10s 10s Data access http://personalgenomes.org/harvard/data http://genomaustria.at/unser- genom/#genome-der- pionierinnen Project funding Discretional funds and corporate sponsoring Institutional startup funds Discretional funds and corporate sponsoring Institutional startup funds Areas of emphasis Integration with phenotypic data, collaboration with other personal omics initiatives Genome donations, synergy with massive-scale clinical genome sequencing projects Genomes and society, genetic literacy, school projects, education Website http://personalgenomes.org/harvard/ http://personalgenomes.org/canada/ http://personalgenomes.org/uk/ http://genomaustria.at/
  • 36. Summary of data access barriers Data is uploaded to repository Data is discovered by potential user Data is accessed by potential user
  • 37. • “even when researchers are authorised to share data they report reluctance to do so because of the amount of effort required“ http://www.sciencedirect.com/science/article/pii/S2212066114000386 • “Clinical geneticists cited a lack of time because their main priority is diagnosing patients. Industrial researchers cited a lack of time because of the pressure to meet the deadlines in their job. Researchers in academia cited both a concern about the potential loss of future publications once unpublished data is shared, and the lack of time and incentive to share data as this does not contribute to their publication record. Researchers from all categories felt that they lacked sufficient resources to make their data available.” The barrier of making data available But I do not want to share my data
  • 38. • If you expect data to be available to you – you have to make your data available too! • Encourage collaborations: power by numbers 1. Get credit – publish and make your data available 2. Give credit – cite data sources 3. Understand consent – for all uses of clinical data Best practices
  • 39. • Use all available tools to make your life easier: • Data publications  visibility and citations for your data, e.g. GigaScience and Scientific Data • Figshare, Zenodo, Dryad for sharing open access data • PhenomeCentral, Matchmaker exchange for rare disease research • Repositive for finding data across repositories and make your own data discoverable Best practices: use the tools
  • 40. Does data sharing matter at grant proposal evaluation Based on: Winning Horizon 2020 with Open Science, http://dx.doi.org/10.5281/zenodo.12247 Best practices: Plan into your grant proposals
  • 41. “Weakness: Involvement of non- academic beneficiaries is limited” “Weakness: highly focused on academic activities, and lacks an advanced communication strategy” “Weakness: limited exposure to non-academic partners & infrastructures” Excellence Impact Implementation “data accessibility is unclear!” “data storage & access not considered” Best practices: Plan into your grant proposals
  • 42. “Strengths: extensive dissemination of data to the scientific community (open access, databases)” “outreach activities to a broad audience” “research software is freely available” Impact: Best practices: Plan into your grant proposals
  • 43. Best practices: Plan into your grant proposals
  • 44. Make the (research) world a better place by sharing in return  Best practices: Share in return!
  • 45. • Digital consent: towards automatic processing of applications • Dynamic consent and power to the patient, e.g. PatientsKnowBest • Privacy-preserving access to datasets: preserving control and governance with data custodian, lower barrier for access What the future holds
  • 46. 4. Hands-on session using Repositive What if finding data was as easy as finding a book on Amazon, book a hotel on Expedia?
  • 47. Repositive promotes best practices Discover new data sources EASY SEARCH
  • 48. Repositive promotes best practices Make your data visible SHARE KNOWLEDGE
  • 49. Repositive promotes best practices Build a data community BUILD TRUST
  • 50. Benefit for both sides of data collaboration Data consumers Data producers Find relevant data faster Feedback from other users through ratings and comments to evaluate data quality Find collaborators with data Make your data visible Build credibility as a trusted provider of quality data Find collaborators to analyse your data
  • 52. 5. Summary and feedback • Get credit – publish data • Give credit – cite data • Understand consent
  • 53. Tell us your thoughts: @repositiveio @glyn_dk And read more on http://repositive.io

Notes de l'éditeur

  1. Because interpretation requires LOTS of data And although data exists around the world, it is siloed, and even if available, it is not accessible This is Jenn, a genetic researcher –our target customer- seeking to interpret data from genetic diseases and cancer She needs data from other patients to compare and interpret Mabels DNA She also has data available in her own lab, but she cannot share because of concerns how to deal with secure access to sensitive data and data governance, e.g. vetting of users
  2. It has been shown that the combination of summary single-variant statistics from multiple data sets, rather than the joint analysis of a combined data set, does not result in an appreciable loss of information85, and that taking into account heterogeneity in effect size across studies can improve statistical power
  3. “Although they are harder to call and annotate, insertion or deletions, multinucleotide variants and structural variants (including copy-number variants, translocations and inversions) constitute a smaller set of variation (in terms of the number of discrete events an individual is expected to carry) relative to all SNVs and are more likely to have functional effects.”
  4. It has been shown that the combination of summary single-variant statistics from multiple data sets, rather than the joint analysis of a combined data set, does not result in an appreciable loss of information85, and that taking into account heterogeneity in effect size across studies can improve statistical power
  5. Because interpretation requires LOTS of data And although data exists around the world, it is siloed, and even if available, it is not accessible This is Jenn, a genetic researcher –our target customer- seeking to interpret data from genetic diseases and cancer She needs data from other patients to compare and interpret Mabels DNA She also has data available in her own lab, but she cannot share because of concerns how to deal with secure access to sensitive data and data governance, e.g. vetting of users
  6. Population scale genome sequencing projects have been launched all over the world More than 80PB of human genomic data is being sequenced Every year BUT To date only around .5PB of data available in public repositories
  7. Further confounded by the data being highly fragmented. Siloed in repositories and institutions around the world.
  8. There are many public repositories, but It can be hugely confusing to know where to look for the right kind of data
  9. Public repositories: default is apply for access -> full access Benefits: strict governance, review of consent, applicant signs for full responsibility for governance Disadvantages: No control of data once access is given, high barrier for access – too high? (researchers giving up, even patients can’t get access to their own data)
  10. ODP trained, EURO-BASIN manager, – a boring title, for a diverse job, in an exciting research domain. DIP into EACH step of the research cycle, from proposal formulation to providing the best return-on-investment to the funders. So I`d like to share with you some experiences from the last few years of OS advocacy in the Marine Science Community
  11. Excellence at your Research Subject is … excellent, but is it ENOUGH ? To be successful, a candidate will be judged on being complete. MESSAGE: FOSUC only on IF could expose you to risk
  12. ODP trained, EURO-BASIN manager, – a boring title, for a diverse job, in an exciting research domain. DIP into EACH step of the research cycle, from proposal formulation to providing the best return-on-investment to the funders. So I`d like to share with you some experiences from the last few years of OS advocacy in the Marine Science Community
  13. Our mission is to speed up research and diagnostics for genetic diseases by enabling efficient and ethical access to genomic research data
  14. This is not a biobank, but ToMMo biobank deposited some of their data there, so I thought it is worth mentioning here.
  15. Our mission is to speed up research and diagnostics for genetic diseases by enabling efficient and ethical access to genomic research data