SlideShare une entreprise Scribd logo
1  sur  53
Assembling diverse & rich
metagenomes: the secrets of
the ancients.
C. Titus Brown
ctb@msu.edu
Introducing myself --
ged.msu.edu/
 “Data-intensive biology” – tools, etc.
 Not a marine microbiologist at all!
Note: these slides are all on slideshare.
(Google “titus brown slide share”)
My goals
 Enable hypothesis-driven biology
through better hypothesis generation
& refinement.
 Devalue “interest level” of sequence
analysis and put myself out of a job.
 Be a good mutualist!
Part I: Soil Assembly & the
Great Prairie Grand
Challenge
2008
Soil microbial ecology -
questions
 What ecosystem level functions are present,
and how do microbes do them?
 How does agricultural soil differ from native
soil?
 How does soil respond to climate
perturbation?
 Questions that are not easy to answer
without shotgun sequencing:
◦ What kind of strain-level heterogeneity is present
in the population?
◦ What does the phage and viral population look
like?
◦ What species are where?
A “Grand Challenge” dataset
(DOE/JGI)
0
100
200
300
400
500
600
Iowa,
Continuous
corn
Iowa, Native
Prairie
Kansas,
Cultivated
corn
Kansas,
Native
Prairie
Wisconsin,
Continuous
corn
Wisconsin,
Native
Prairie
Wisconsin,
Restored
Prairie
Wisconsin,
Switchgrass
BasepairsofSequencing(Gbp)
GAII HiSeq
Rumen (Hess et. al, 2011), 268 Gbp
MetaHIT (Qin et. al, 2011), 578 Gbp
NCBI nr database,
37 Gbp
Total: 1,846 Gbp soil metagenome
Rumen K-mer Filtered,
111 Gbp
Adina Howe
Approach – assemble into
contigs.
 We found that short reads from
phylogenetically distant and
microbially diverse environments
could not be reliably annotated.
=> Build into longer contigs first.
…5 year odyssey…
(Friends don’t let friends BLAST short
reads.**)
** Applicable to most environmental samples.Howe et al., 2014
Developed two new methods
--
I. Computational “cell sorting”
II. Computational “library
normalization.”
See:
• Pell et al., Tiedje, Brown (2012);
• Howe et al., Tiedje, Brown (2014);
• Goffredi et al. (2014)
Digital normalization
Digital normalization
Digital normalization
Digital normalization
Digital normalization
Digital normalization
Putting it in perspective:
Total equivalent of ~1200 bacterial genomes
Human genome ~3 billion bp
Result: we (easily, casually) assembled
two of the biggest metagenomes ever.
Total
Assembly
Total Contigs
(> 300 bp)
% Reads
Assembled
Predicted
protein
coding
2.5 bill 4.5 mill 19% 5.3 mill
3.5 bill 5.9 mill 22% 6.8 mill
Howe et al, 2014; pmid 24632729
(I’ll come back to this)
So…
We can now achieve an assembly of
pretty much anything (soil was really
hard, virtually everything else is easier!)
Lots of people are interested in
collaborating with us on this!
…but we regard it as a
largely solved problem.
I: assembly “protocols”
 khmer-protocols: open, versioned, citable,
forkable set of instructions to assemble euk
mRNAseq and metagenomes on widely
accessible compute resources.
 Explicit command-line instructions to go from
raw reads to annotated “final product”.
 For mRNAseq: ~$150/compute for $2000 of
data.
(Still in beta, note.)
khmer-protocols Read cleaning
Preprocessing
Assembly
Annotation
Example - Deep Carbon data
set
 Masimong Gold Mine; microbial cells
filtered from fracture water from within
a 1.9km borehole. (32,000 year old
water)
 5.6m reads, 601.3 Mbp;
◦ computational protocol took 4 hours;
◦ Assembled to 56 Mbp > 300 bp
◦ longest contig is 73kb
◦ 70% of paired-end reads mapped.
20
w/M.C.Y. Lau, Tullis Onstott
Our (open) approach:
 If the protocols work for you, great! Cite
us.
 If the protocols don’t work for you, please
let us know so we can fix them.
 If it’s a challenging problem, we’d love
to collaborate.
 We are also happy to help train people.
Things we no longer worry about
(much) – let’s chat:
 Inter-species assembly chimerae
…apart from w/in strain variants, chimerae
are hard to form with contig assembly.
 Finding homology matches in metagenomes
…contigs give as good a
match as possible.
 Assembling contigs when we have sufficient
coverage
…not enough coverage is
usually the problem.
II: Shotgun sequencing and
coverage
“Coverage” is simply the average number of reads that overlap
each true base in genome.
Here, the coverage is ~10 – just draw a line straight down from the
top through all of the reads.
23
Random sampling => deep sampling
needed
Typically 10-100x needed for robust recovery (300 Gbp for human)
24
Assembly depends on high
coverage
25
HMP mock community
Downstream goals of
assembly:
(Even assuming ribotyping works perfectly)
 Annotate genes with higher confidence.
 Reconstruct operons & ultimately even
full genomes.
 Analyze strain variation.
 Study organisms that ribotyping can’t
(phage & virus)
Main questions --
I. How do we know if we’ve sequenced
enough?
II. Can we predict how much more we
need to sequence to see <insert
some feature here>?
Note: necessary sequencing depth cannot
accurately be predicted from SSU/amplicon
data
Method 1: looking for WGS
saturation
We can track how many sequences we
keep of the sequences we’ve seen, to
detect saturation.
Data from Shakya et al., 2013 (pmid: 23387867
We can detect saturation of
shotgun sequencing
Data from Shakya et al., 2013 (pmid: 23387867
We can detect saturation of
shotgun sequencing
C=10, for assembly
Estimating metagenome nt
richness:
# bp at saturation / coverage
 MM5 deep carbon: 60 Mbp
 Iowa prairie soil: 12 Gbp
 Amazon Rain Forest Microbial
Observatory soil: 26 Gbp
Assumes: few entirely erroneous reads (upper
bound); at saturation (lower bound).
31
WGS saturation approach:
 Tells us when we have enough
sequence.
 Can’t be predictive… if you haven’t
sampled something, you can’t say
anything about it.
Can we correlate deep amplicon
sequencing with shallower WGS?
Correlating 16s and shotgun
seq
Errors do not strongly affect saturatio
How
much
of 16s
do
you
see…
with how much shotgun sequencing
Data from Shakya et al., 2013 (pmid: 23387867
WGS saturation ~matches 16s saturation
< rRNA copy
number >
16s region choice is not significant (?!)
Data from Shakya et al., 2013 (pmid: 23387867
Method is robust to organisms
unsampled by amplicons.
Insensitive to
amplicon primer
bias.
Robust to genome
size differences,
eukaryotes, phage.
Data from Shakya et al., 2013 (pmid: 23387867
Can examine specific OTUs
Data from Shakya et al., 2013 (pmid: 23387867
OTU abundance is ~correct.
Data from Shakya et al., 2013 (pmid: 23387867
Running on real communities
--
Running on real communities
--
Thoughts on 16s/WGS
comparison:
 Robust to some real problems (primer
bias; organisms unsampled by
amplicon seq) & insensitive to 16s seq
error.
 Hopefully can be used to build a
predictive framework to answer “how
much more sequencing should I do?”
◦ Sensitivity: “What have I missed?”
◦ Planning: “How much $$ should I ask
Other things that y’all might be
interested in:
 Comparing 16s from amplicon and
shotgun sequencing.
 Metatranscriptome assembly protocol
 Biogeography of genomic sequence
Metatranscriptome assembly
(soil)
Total Length
(bp)
Total rRNA
(bp)
Total
annotated by
MG-RAST
m5nr SEED
Unassembled
MetaT
20,525,296,600
16,987,863,800
(82.8%)
48,080,200
(0.23%)
Assembled
MetaT
32,471,548
7,061,913
(21.8%)
2,075,701
(6.4%)
Aaron Garoutte (w/Tiedje & Howe)
Using shotgun sequence to
cross-validate amplicon
predictions
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
35.00%
40.00%
AMP/RDP AMP/SILVA WGS/RDP WGS/SILVA WGS/SILVA(LSU)
Amplicon seq missing Verrucomicrob
Jaron Guo
Primer bias against
Verrucomicrobia
Check taxonomy of reads causing
mismatch (A)
Verrucomicrobia cause
70% (117/168) of
mismatch
Current primer is not effective at amplifying
Verrucomicrobia
Jaron Guo
Biogeography of genomic
DNA
How much genomic DNA is shared between
different sites?
Qingpeng Zhang
Biogeography of genomic DNA
(2)
How much genomic richness is shared
between different sites?
Qingpeng Zhang
Concluding thoughts
 Tools and protocols for data analysis are
fast becoming intrinsic to practice of
biology.
◦ Most tools are wrong, but some are useful.
◦ All of our tools are openly, freely available in
every way possible.
 We are trying to make assembly fast,
cheap, easy, and good.
 We are building on our assembly-based
approaches & intuition to tackle other
questions.
Big Data is neither the real
problem nor the solution.
 Dealing with Big Data requires a new
mentality, so training/experience is
probably most effective way forward.
 With sequencing, few if any of your
biology problems go away, although
some aspects may become more
tractable.
 Think future: any -ome you want from
any sample you can get. …So now
Putting it in perspective:
Total equivalent of ~1200 bacterial genomes
Human genome ~3 billion bp
We don’t know what most genes do.
Total
Assembly
Total Contigs
(> 300 bp)
% Reads
Assembled
Predicted
protein
coding
2.5 bill 4.5 mill 19% 5.3 mill
3.5 bill 5.9 mill 22% 6.8 mill
Howe et al, 2014; pmid 24632729
Potential discussion topics
A. Funding and collaboration models.
B. Leveraging data & computation to
help understand gene function.
C. Computational/data infrastructure
…but planning for poverty, not wealth:
sustainability and “bus factor”.
D. Capacity building
 Standardized data sets; data availability.
 Workshops and training.
Training in data analysis et al.
 Software Carpentry.
 Data Carpentry.
 STAMPS, EDAMAME, MSU NGS
course.
 <other courses go here>
Potential discussion topics
A. Funding and collaboration models.
B. Leveraging data & computation to
help understand gene function.
C. Computational/data infrastructure
…but planning for poverty, not
wealth: sustainability and “bus factor”.
D. Capacity building
 Standardized data sets; data
availability.
 Workshops and training.

Contenu connexe

Tendances

Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and MetagenomicsCross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics Christopher Mason
 
Building an Information Infrastructure to Support Microbial Metagenomic Sciences
Building an Information Infrastructure to Support Microbial Metagenomic SciencesBuilding an Information Infrastructure to Support Microbial Metagenomic Sciences
Building an Information Infrastructure to Support Microbial Metagenomic SciencesLarry Smarr
 
Jan2016 dnanexus giab uses andrew carroll
Jan2016 dnanexus giab uses andrew carrollJan2016 dnanexus giab uses andrew carroll
Jan2016 dnanexus giab uses andrew carrollGenomeInABottle
 
Building bioinformatics resources for the global community
Building bioinformatics resources for the global communityBuilding bioinformatics resources for the global community
Building bioinformatics resources for the global communityExternalEvents
 
Intro to metagenomic binning
Intro to metagenomic binningIntro to metagenomic binning
Intro to metagenomic binningA. Murat Eren
 
Revised Bio 1wfx Recombinant D N A
Revised  Bio 1wfx   Recombinant  D N ARevised  Bio 1wfx   Recombinant  D N A
Revised Bio 1wfx Recombinant D N AHans Lim
 
Evaluation of Pool-Seq as a cost-effective alternative to GWAS
Evaluation of Pool-Seq as a cost-effective alternative to GWASEvaluation of Pool-Seq as a cost-effective alternative to GWAS
Evaluation of Pool-Seq as a cost-effective alternative to GWASAmin Mohamed
 
2016 bd2k bgood_wikidata
2016 bd2k bgood_wikidata2016 bd2k bgood_wikidata
2016 bd2k bgood_wikidataBenjamin Good
 
NetBioSIG2012 anyatsalenko-en-viz
NetBioSIG2012 anyatsalenko-en-vizNetBioSIG2012 anyatsalenko-en-viz
NetBioSIG2012 anyatsalenko-en-vizAlexander Pico
 
2013 hmp-assembly-webinar
2013 hmp-assembly-webinar2013 hmp-assembly-webinar
2013 hmp-assembly-webinarc.titus.brown
 
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...VHIR Vall d’Hebron Institut de Recerca
 
An Introduction to Crispr Genome Editing
An Introduction to Crispr Genome EditingAn Introduction to Crispr Genome Editing
An Introduction to Crispr Genome EditingChris Thorne
 
20170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_10120170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_101Ino de Bruijn
 

Tendances (20)

Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and MetagenomicsCross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
 
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
 
Building an Information Infrastructure to Support Microbial Metagenomic Sciences
Building an Information Infrastructure to Support Microbial Metagenomic SciencesBuilding an Information Infrastructure to Support Microbial Metagenomic Sciences
Building an Information Infrastructure to Support Microbial Metagenomic Sciences
 
Jan2016 dnanexus giab uses andrew carroll
Jan2016 dnanexus giab uses andrew carrollJan2016 dnanexus giab uses andrew carroll
Jan2016 dnanexus giab uses andrew carroll
 
Jan2016 pac bio giab
Jan2016 pac bio giabJan2016 pac bio giab
Jan2016 pac bio giab
 
Building bioinformatics resources for the global community
Building bioinformatics resources for the global communityBuilding bioinformatics resources for the global community
Building bioinformatics resources for the global community
 
Folker Meyer: Metagenomic Data Annotation
Folker Meyer: Metagenomic Data AnnotationFolker Meyer: Metagenomic Data Annotation
Folker Meyer: Metagenomic Data Annotation
 
Intro to metagenomic binning
Intro to metagenomic binningIntro to metagenomic binning
Intro to metagenomic binning
 
David
DavidDavid
David
 
Mason abrf single_cell_2017
Mason abrf single_cell_2017Mason abrf single_cell_2017
Mason abrf single_cell_2017
 
Revised Bio 1wfx Recombinant D N A
Revised  Bio 1wfx   Recombinant  D N ARevised  Bio 1wfx   Recombinant  D N A
Revised Bio 1wfx Recombinant D N A
 
2015 bioinformatics wim_vancriekinge
2015 bioinformatics wim_vancriekinge2015 bioinformatics wim_vancriekinge
2015 bioinformatics wim_vancriekinge
 
Evaluation of Pool-Seq as a cost-effective alternative to GWAS
Evaluation of Pool-Seq as a cost-effective alternative to GWASEvaluation of Pool-Seq as a cost-effective alternative to GWAS
Evaluation of Pool-Seq as a cost-effective alternative to GWAS
 
2016 bd2k bgood_wikidata
2016 bd2k bgood_wikidata2016 bd2k bgood_wikidata
2016 bd2k bgood_wikidata
 
NetBioSIG2012 anyatsalenko-en-viz
NetBioSIG2012 anyatsalenko-en-vizNetBioSIG2012 anyatsalenko-en-viz
NetBioSIG2012 anyatsalenko-en-viz
 
2013 hmp-assembly-webinar
2013 hmp-assembly-webinar2013 hmp-assembly-webinar
2013 hmp-assembly-webinar
 
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
 
An Introduction to Crispr Genome Editing
An Introduction to Crispr Genome EditingAn Introduction to Crispr Genome Editing
An Introduction to Crispr Genome Editing
 
20170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_10120170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_101
 

En vedette

Preserve Plan 4
Preserve Plan 4Preserve Plan 4
Preserve Plan 4lmeneley
 
MCHRP Evaluation Report f1 15-09-2012
MCHRP Evaluation Report f1 15-09-2012MCHRP Evaluation Report f1 15-09-2012
MCHRP Evaluation Report f1 15-09-2012Zafar Ahmad
 
Shepley ross introduction_od_es_manual_4th
Shepley ross introduction_od_es_manual_4thShepley ross introduction_od_es_manual_4th
Shepley ross introduction_od_es_manual_4thgabo GAG
 
Nobel Cloud Services
Nobel Cloud ServicesNobel Cloud Services
Nobel Cloud ServicesPiet van Vugt
 
Ashleigh and Sarah's: Killer Whales
Ashleigh and Sarah's: Killer WhalesAshleigh and Sarah's: Killer Whales
Ashleigh and Sarah's: Killer WhalesTakahe One
 
Assistenza Decreto Abruzzo
Assistenza Decreto AbruzzoAssistenza Decreto Abruzzo
Assistenza Decreto AbruzzoGiovanni DG
 
2014 ismb-extra-slides
2014 ismb-extra-slides2014 ismb-extra-slides
2014 ismb-extra-slidesc.titus.brown
 
Effectiveness of Digital Advertising For Brand Campaigns
Effectiveness of Digital Advertising For Brand CampaignsEffectiveness of Digital Advertising For Brand Campaigns
Effectiveness of Digital Advertising For Brand CampaignsEyeblaster Spain
 
Growing Through China: A Comprehensive Look at Market Opportunities
Growing Through China: A Comprehensive Look at Market Opportunities Growing Through China: A Comprehensive Look at Market Opportunities
Growing Through China: A Comprehensive Look at Market Opportunities Kegler Brown Hill + Ritter
 
Key Compliances in Investing Abroad | Vinita Bahri-Mehra
Key Compliances in Investing Abroad | Vinita Bahri-MehraKey Compliances in Investing Abroad | Vinita Bahri-Mehra
Key Compliances in Investing Abroad | Vinita Bahri-MehraKegler Brown Hill + Ritter
 
Advanced Site Recovery -- Technical Briefing
Advanced Site Recovery -- Technical BriefingAdvanced Site Recovery -- Technical Briefing
Advanced Site Recovery -- Technical BriefingJames Price
 
Social Media Strategies - Blog to Broadcast
Social Media Strategies - Blog to BroadcastSocial Media Strategies - Blog to Broadcast
Social Media Strategies - Blog to BroadcastTWoolf
 
Наглядне роз'яснення розподілу державного бюджету або чому потрібен новий Под...
Наглядне роз'яснення розподілу державного бюджету або чому потрібен новий Под...Наглядне роз'яснення розподілу державного бюджету або чому потрібен новий Под...
Наглядне роз'яснення розподілу державного бюджету або чому потрібен новий Под...Олег Федосенко
 
Undangan (Kak Melly n Kak Dicky)
Undangan (Kak Melly n Kak Dicky)Undangan (Kak Melly n Kak Dicky)
Undangan (Kak Melly n Kak Dicky)@rtNya
 
電子商務溝通 – 期末考
電子商務溝通 – 期末考電子商務溝通 – 期末考
電子商務溝通 – 期末考guestaff5e9
 

En vedette (20)

Preserve Plan 4
Preserve Plan 4Preserve Plan 4
Preserve Plan 4
 
MCHRP Evaluation Report f1 15-09-2012
MCHRP Evaluation Report f1 15-09-2012MCHRP Evaluation Report f1 15-09-2012
MCHRP Evaluation Report f1 15-09-2012
 
Shepley ross introduction_od_es_manual_4th
Shepley ross introduction_od_es_manual_4thShepley ross introduction_od_es_manual_4th
Shepley ross introduction_od_es_manual_4th
 
2014 mmg-talk
2014 mmg-talk2014 mmg-talk
2014 mmg-talk
 
Demystifying SEO
Demystifying SEODemystifying SEO
Demystifying SEO
 
2016 legal seminar for credit professionals
2016 legal seminar for credit professionals2016 legal seminar for credit professionals
2016 legal seminar for credit professionals
 
What is electricity
What is electricityWhat is electricity
What is electricity
 
Nobel Cloud Services
Nobel Cloud ServicesNobel Cloud Services
Nobel Cloud Services
 
Ashleigh and Sarah's: Killer Whales
Ashleigh and Sarah's: Killer WhalesAshleigh and Sarah's: Killer Whales
Ashleigh and Sarah's: Killer Whales
 
Seniorforsker Uffe Jørgensen
Seniorforsker Uffe JørgensenSeniorforsker Uffe Jørgensen
Seniorforsker Uffe Jørgensen
 
Assistenza Decreto Abruzzo
Assistenza Decreto AbruzzoAssistenza Decreto Abruzzo
Assistenza Decreto Abruzzo
 
2014 ismb-extra-slides
2014 ismb-extra-slides2014 ismb-extra-slides
2014 ismb-extra-slides
 
Effectiveness of Digital Advertising For Brand Campaigns
Effectiveness of Digital Advertising For Brand CampaignsEffectiveness of Digital Advertising For Brand Campaigns
Effectiveness of Digital Advertising For Brand Campaigns
 
Growing Through China: A Comprehensive Look at Market Opportunities
Growing Through China: A Comprehensive Look at Market Opportunities Growing Through China: A Comprehensive Look at Market Opportunities
Growing Through China: A Comprehensive Look at Market Opportunities
 
Key Compliances in Investing Abroad | Vinita Bahri-Mehra
Key Compliances in Investing Abroad | Vinita Bahri-MehraKey Compliances in Investing Abroad | Vinita Bahri-Mehra
Key Compliances in Investing Abroad | Vinita Bahri-Mehra
 
Advanced Site Recovery -- Technical Briefing
Advanced Site Recovery -- Technical BriefingAdvanced Site Recovery -- Technical Briefing
Advanced Site Recovery -- Technical Briefing
 
Social Media Strategies - Blog to Broadcast
Social Media Strategies - Blog to BroadcastSocial Media Strategies - Blog to Broadcast
Social Media Strategies - Blog to Broadcast
 
Наглядне роз'яснення розподілу державного бюджету або чому потрібен новий Под...
Наглядне роз'яснення розподілу державного бюджету або чому потрібен новий Под...Наглядне роз'яснення розподілу державного бюджету або чому потрібен новий Под...
Наглядне роз'яснення розподілу державного бюджету або чому потрібен новий Под...
 
Undangan (Kak Melly n Kak Dicky)
Undangan (Kak Melly n Kak Dicky)Undangan (Kak Melly n Kak Dicky)
Undangan (Kak Melly n Kak Dicky)
 
電子商務溝通 – 期末考
電子商務溝通 – 期末考電子商務溝通 – 期末考
電子商務溝通 – 期末考
 

Similaire à 2014 marine-microbes-grc

2012 hpcuserforum talk
2012 hpcuserforum talk2012 hpcuserforum talk
2012 hpcuserforum talkc.titus.brown
 
2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...
2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...
2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...c.titus.brown
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorialc.titus.brown
 
2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotesc.titus.brown
 
BEACON 101: Sequencing tech
BEACON 101: Sequencing techBEACON 101: Sequencing tech
BEACON 101: Sequencing techc.titus.brown
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsmikaelhuss
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxxRowlet
 
Curate locally, think globally
Curate locally, think globallyCurate locally, think globally
Curate locally, think globallyValerie Wood
 
Trends In Genomics
Trends In GenomicsTrends In Genomics
Trends In GenomicsSaul Kravitz
 

Similaire à 2014 marine-microbes-grc (20)

2015 mcgill-talk
2015 mcgill-talk2015 mcgill-talk
2015 mcgill-talk
 
2014 nyu-bio-talk
2014 nyu-bio-talk2014 nyu-bio-talk
2014 nyu-bio-talk
 
2012 hpcuserforum talk
2012 hpcuserforum talk2012 hpcuserforum talk
2012 hpcuserforum talk
 
2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...
2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...
2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...
 
2015 osu-metagenome
2015 osu-metagenome2015 osu-metagenome
2015 osu-metagenome
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
 
2015 illinois-talk
2015 illinois-talk2015 illinois-talk
2015 illinois-talk
 
2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes
 
2014 naples
2014 naples2014 naples
2014 naples
 
2014 villefranche
2014 villefranche2014 villefranche
2014 villefranche
 
BEACON 101: Sequencing tech
BEACON 101: Sequencing techBEACON 101: Sequencing tech
BEACON 101: Sequencing tech
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomics
 
2016 bergen-sars
2016 bergen-sars2016 bergen-sars
2016 bergen-sars
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptx
 
Sweden_eemis_big_data
Sweden_eemis_big_dataSweden_eemis_big_data
Sweden_eemis_big_data
 
Big Data Field Museum
Big Data Field MuseumBig Data Field Museum
Big Data Field Museum
 
Curate locally, think globally
Curate locally, think globallyCurate locally, think globally
Curate locally, think globally
 
Trends In Genomics
Trends In GenomicsTrends In Genomics
Trends In Genomics
 

Plus de c.titus.brown

Plus de c.titus.brown (17)

2016 davis-biotech
2016 davis-biotech2016 davis-biotech
2016 davis-biotech
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
2015 msu-code-review
2015 msu-code-review2015 msu-code-review
2015 msu-code-review
 
2015 pycon-talk
2015 pycon-talk2015 pycon-talk
2015 pycon-talk
 
2015 opencon-webcast
2015 opencon-webcast2015 opencon-webcast
2015 opencon-webcast
 
2015 vancouver-vanbug
2015 vancouver-vanbug2015 vancouver-vanbug
2015 vancouver-vanbug
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformatics
 
2015 pag-chicken
2015 pag-chicken2015 pag-chicken
2015 pag-chicken
 
2015 pag-metagenome
2015 pag-metagenome2015 pag-metagenome
2015 pag-metagenome
 
2014 anu-canberra-streaming
2014 anu-canberra-streaming2014 anu-canberra-streaming
2014 anu-canberra-streaming
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibility
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
 
2014 abic-talk
2014 abic-talk2014 abic-talk
2014 abic-talk
 
2014 nci-edrn
2014 nci-edrn2014 nci-edrn
2014 nci-edrn
 
2014 wcgalp
2014 wcgalp2014 wcgalp
2014 wcgalp
 
2014 moore-ddd
2014 moore-ddd2014 moore-ddd
2014 moore-ddd
 
2014 bosc-keynote
2014 bosc-keynote2014 bosc-keynote
2014 bosc-keynote
 

Dernier

User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 

Dernier (20)

User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 

2014 marine-microbes-grc

  • 1. Assembling diverse & rich metagenomes: the secrets of the ancients. C. Titus Brown ctb@msu.edu
  • 2. Introducing myself -- ged.msu.edu/  “Data-intensive biology” – tools, etc.  Not a marine microbiologist at all! Note: these slides are all on slideshare. (Google “titus brown slide share”)
  • 3. My goals  Enable hypothesis-driven biology through better hypothesis generation & refinement.  Devalue “interest level” of sequence analysis and put myself out of a job.  Be a good mutualist!
  • 4. Part I: Soil Assembly & the Great Prairie Grand Challenge 2008
  • 5. Soil microbial ecology - questions  What ecosystem level functions are present, and how do microbes do them?  How does agricultural soil differ from native soil?  How does soil respond to climate perturbation?  Questions that are not easy to answer without shotgun sequencing: ◦ What kind of strain-level heterogeneity is present in the population? ◦ What does the phage and viral population look like? ◦ What species are where?
  • 6. A “Grand Challenge” dataset (DOE/JGI) 0 100 200 300 400 500 600 Iowa, Continuous corn Iowa, Native Prairie Kansas, Cultivated corn Kansas, Native Prairie Wisconsin, Continuous corn Wisconsin, Native Prairie Wisconsin, Restored Prairie Wisconsin, Switchgrass BasepairsofSequencing(Gbp) GAII HiSeq Rumen (Hess et. al, 2011), 268 Gbp MetaHIT (Qin et. al, 2011), 578 Gbp NCBI nr database, 37 Gbp Total: 1,846 Gbp soil metagenome Rumen K-mer Filtered, 111 Gbp Adina Howe
  • 7. Approach – assemble into contigs.  We found that short reads from phylogenetically distant and microbially diverse environments could not be reliably annotated. => Build into longer contigs first. …5 year odyssey…
  • 8. (Friends don’t let friends BLAST short reads.**) ** Applicable to most environmental samples.Howe et al., 2014
  • 9. Developed two new methods -- I. Computational “cell sorting” II. Computational “library normalization.” See: • Pell et al., Tiedje, Brown (2012); • Howe et al., Tiedje, Brown (2014); • Goffredi et al. (2014)
  • 16. Putting it in perspective: Total equivalent of ~1200 bacterial genomes Human genome ~3 billion bp Result: we (easily, casually) assembled two of the biggest metagenomes ever. Total Assembly Total Contigs (> 300 bp) % Reads Assembled Predicted protein coding 2.5 bill 4.5 mill 19% 5.3 mill 3.5 bill 5.9 mill 22% 6.8 mill Howe et al, 2014; pmid 24632729 (I’ll come back to this)
  • 17. So… We can now achieve an assembly of pretty much anything (soil was really hard, virtually everything else is easier!) Lots of people are interested in collaborating with us on this! …but we regard it as a largely solved problem.
  • 18. I: assembly “protocols”  khmer-protocols: open, versioned, citable, forkable set of instructions to assemble euk mRNAseq and metagenomes on widely accessible compute resources.  Explicit command-line instructions to go from raw reads to annotated “final product”.  For mRNAseq: ~$150/compute for $2000 of data. (Still in beta, note.)
  • 20. Example - Deep Carbon data set  Masimong Gold Mine; microbial cells filtered from fracture water from within a 1.9km borehole. (32,000 year old water)  5.6m reads, 601.3 Mbp; ◦ computational protocol took 4 hours; ◦ Assembled to 56 Mbp > 300 bp ◦ longest contig is 73kb ◦ 70% of paired-end reads mapped. 20 w/M.C.Y. Lau, Tullis Onstott
  • 21. Our (open) approach:  If the protocols work for you, great! Cite us.  If the protocols don’t work for you, please let us know so we can fix them.  If it’s a challenging problem, we’d love to collaborate.  We are also happy to help train people.
  • 22. Things we no longer worry about (much) – let’s chat:  Inter-species assembly chimerae …apart from w/in strain variants, chimerae are hard to form with contig assembly.  Finding homology matches in metagenomes …contigs give as good a match as possible.  Assembling contigs when we have sufficient coverage …not enough coverage is usually the problem.
  • 23. II: Shotgun sequencing and coverage “Coverage” is simply the average number of reads that overlap each true base in genome. Here, the coverage is ~10 – just draw a line straight down from the top through all of the reads. 23
  • 24. Random sampling => deep sampling needed Typically 10-100x needed for robust recovery (300 Gbp for human) 24
  • 25. Assembly depends on high coverage 25 HMP mock community
  • 26. Downstream goals of assembly: (Even assuming ribotyping works perfectly)  Annotate genes with higher confidence.  Reconstruct operons & ultimately even full genomes.  Analyze strain variation.  Study organisms that ribotyping can’t (phage & virus)
  • 27. Main questions -- I. How do we know if we’ve sequenced enough? II. Can we predict how much more we need to sequence to see <insert some feature here>? Note: necessary sequencing depth cannot accurately be predicted from SSU/amplicon data
  • 28. Method 1: looking for WGS saturation We can track how many sequences we keep of the sequences we’ve seen, to detect saturation.
  • 29. Data from Shakya et al., 2013 (pmid: 23387867 We can detect saturation of shotgun sequencing
  • 30. Data from Shakya et al., 2013 (pmid: 23387867 We can detect saturation of shotgun sequencing C=10, for assembly
  • 31. Estimating metagenome nt richness: # bp at saturation / coverage  MM5 deep carbon: 60 Mbp  Iowa prairie soil: 12 Gbp  Amazon Rain Forest Microbial Observatory soil: 26 Gbp Assumes: few entirely erroneous reads (upper bound); at saturation (lower bound). 31
  • 32. WGS saturation approach:  Tells us when we have enough sequence.  Can’t be predictive… if you haven’t sampled something, you can’t say anything about it. Can we correlate deep amplicon sequencing with shallower WGS?
  • 33. Correlating 16s and shotgun seq Errors do not strongly affect saturatio How much of 16s do you see… with how much shotgun sequencing
  • 34. Data from Shakya et al., 2013 (pmid: 23387867 WGS saturation ~matches 16s saturation < rRNA copy number >
  • 35. 16s region choice is not significant (?!) Data from Shakya et al., 2013 (pmid: 23387867
  • 36. Method is robust to organisms unsampled by amplicons. Insensitive to amplicon primer bias. Robust to genome size differences, eukaryotes, phage. Data from Shakya et al., 2013 (pmid: 23387867
  • 37. Can examine specific OTUs Data from Shakya et al., 2013 (pmid: 23387867
  • 38. OTU abundance is ~correct. Data from Shakya et al., 2013 (pmid: 23387867
  • 39. Running on real communities --
  • 40. Running on real communities --
  • 41. Thoughts on 16s/WGS comparison:  Robust to some real problems (primer bias; organisms unsampled by amplicon seq) & insensitive to 16s seq error.  Hopefully can be used to build a predictive framework to answer “how much more sequencing should I do?” ◦ Sensitivity: “What have I missed?” ◦ Planning: “How much $$ should I ask
  • 42. Other things that y’all might be interested in:  Comparing 16s from amplicon and shotgun sequencing.  Metatranscriptome assembly protocol  Biogeography of genomic sequence
  • 43. Metatranscriptome assembly (soil) Total Length (bp) Total rRNA (bp) Total annotated by MG-RAST m5nr SEED Unassembled MetaT 20,525,296,600 16,987,863,800 (82.8%) 48,080,200 (0.23%) Assembled MetaT 32,471,548 7,061,913 (21.8%) 2,075,701 (6.4%) Aaron Garoutte (w/Tiedje & Howe)
  • 44. Using shotgun sequence to cross-validate amplicon predictions 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% 40.00% AMP/RDP AMP/SILVA WGS/RDP WGS/SILVA WGS/SILVA(LSU) Amplicon seq missing Verrucomicrob Jaron Guo
  • 45. Primer bias against Verrucomicrobia Check taxonomy of reads causing mismatch (A) Verrucomicrobia cause 70% (117/168) of mismatch Current primer is not effective at amplifying Verrucomicrobia Jaron Guo
  • 46. Biogeography of genomic DNA How much genomic DNA is shared between different sites? Qingpeng Zhang
  • 47. Biogeography of genomic DNA (2) How much genomic richness is shared between different sites? Qingpeng Zhang
  • 48. Concluding thoughts  Tools and protocols for data analysis are fast becoming intrinsic to practice of biology. ◦ Most tools are wrong, but some are useful. ◦ All of our tools are openly, freely available in every way possible.  We are trying to make assembly fast, cheap, easy, and good.  We are building on our assembly-based approaches & intuition to tackle other questions.
  • 49. Big Data is neither the real problem nor the solution.  Dealing with Big Data requires a new mentality, so training/experience is probably most effective way forward.  With sequencing, few if any of your biology problems go away, although some aspects may become more tractable.  Think future: any -ome you want from any sample you can get. …So now
  • 50. Putting it in perspective: Total equivalent of ~1200 bacterial genomes Human genome ~3 billion bp We don’t know what most genes do. Total Assembly Total Contigs (> 300 bp) % Reads Assembled Predicted protein coding 2.5 bill 4.5 mill 19% 5.3 mill 3.5 bill 5.9 mill 22% 6.8 mill Howe et al, 2014; pmid 24632729
  • 51. Potential discussion topics A. Funding and collaboration models. B. Leveraging data & computation to help understand gene function. C. Computational/data infrastructure …but planning for poverty, not wealth: sustainability and “bus factor”. D. Capacity building  Standardized data sets; data availability.  Workshops and training.
  • 52. Training in data analysis et al.  Software Carpentry.  Data Carpentry.  STAMPS, EDAMAME, MSU NGS course.  <other courses go here>
  • 53. Potential discussion topics A. Funding and collaboration models. B. Leveraging data & computation to help understand gene function. C. Computational/data infrastructure …but planning for poverty, not wealth: sustainability and “bus factor”. D. Capacity building  Standardized data sets; data availability.  Workshops and training.

Notes de l'éditeur

  1. Fly-over country (that I live in)
  2. Nothing more frustrating to biologists than having data that you can’t analyze 
  3. Est 200 hrs of my effort
  4. ~Easy to say how much you need for a single genome.
  5. Note: 16s is higher copy number, more sensitive than WGS.
  6. otu5 is acidobacterium; one species, Acidobacterium capsulatum, with one rRNA; 4.6% of BA community, 4.7% of Illumina reads; # otu2 is chlorobium; five species, total of 10 rRNA; 9.1% of Illumina. Correction factor of 5.
  7. JGI v6, 454 amplicon sequencing
  8. Original motivation was, should we combine samples?