SlideShare une entreprise Scribd logo
1  sur  24
Reconstructing
metagenomes
from shotgun
data
C. Titus Brown
UC Davis / School of Veterinary Medicine
ctbrown@ucdavis.edu
Shotgun metagenomics
• Collect samples;
• Extract DNA;
• Feed into sequencer;
• Computationally analyze.
Wikipedia: Environmental shotgun
sequencing.png
To assemble, or not to
assemble?
Goals: reconstruct phylogenetic content and predict
functional potential of ensemble.
• Should we analyze short reads directly?
OR
• Do we assemble short reads into longer contigs first,
and then analyze the contigs?
Assembly: good.
Howe et al., 2014
Assemblies yield much
more significant
homology matches.
But! Assembly is…
• Morally frightening: don’t you mis-assemble
sequences?
• Computationally challenging: don’t you need big
computers?
• Technically tricky: don’t you need to be an expert?
Or… is it?
• Most assembly papers analyze novel data sets and
then have to argue that their result is ok (guilty!)
• Very few assembly benchmarks have been done.
• Even fewer (trustworthy) computational
time/memory comparisons have been done.
• And even fewer “assembly recipes” have been
written down clearly.
A neat paper:
Shakya et al., 2013; pmid 23387867
A mock community!
• ~60 genomes, all sequenced;
• Lab mixed with 10:1 ratio of most abundant to least
abundant;
• 2x101 reads, 107 mn reads total (Illumina);
• 10.5 Gbp of sequence in toto.
• The paper also compared16s primer sets & 454
shotgun metagenome data => reconstruction.
Shakya et al., 2013; pmid 23387867
Paper conclusions
• “Metagenomic sequencing outperformed most SSU
rRNA gene primer sets used in this study.”
• “The Illumina short reads provided a very good estimates
of taxonomic distribution above the species level, with
only a two- to threefold overestimation of the actual
number of genera and orders.”
• “For the 454 data … the use of the default parameters
severely overestimated higher level diversity (~ 20- fold
for bacterial genera and identified > 100 spurious
eukaryotes).”
Shakya et al., 2013; pmid 23387867
How about assembly??
• Shakya et al. did not do assembly; no standard for
analysis at the time, not experts.
• But we work on assembly!
• And we’ve been working on a tutorial/process for
doing it!
Adapter trim &
quality filter
Diginorm to C=10
Trim high-
coverage reads at
low-abundance
k-mers
Diginorm to C=5
Partition
graph
Split into "groups"
Reinflate groups
(optional
Assemble!!!
Map reads to
assembly
Too big to
assemble?
Small enough to assemble?
Annotate contigs
with abundances
MG-RAST, etc.
The Kalamazoo Metagenomics Protocol
Derived from approach used in Howe et al., 2014
Computational protocol for
assembly
Adapter trim &
quality filter
Diginorm to C=10
Trim high-
coverage reads at
low-abundance
k-mers
Diginorm to C=5
Partition
graph
Split into "groups"
Reinflate groups
(optional
Assemble!!!
Map reads to
assembly
Too big to
assemble?
Small enough to assemble?
Annotate contigs
with abundances
MG-RAST, etc.
The Kalamazoo Metagenomics Protocol => benchmarking!
Assemble with Velvet, IDBA, SPAdes
Benchmarking process
• Apply various filtering treatments to the data
(x3)
o Basic quality trimming and filtering
o + digital normalization
o + partitioning
• Apply different assemblers to the data for each
treatment (x3)
o IDBA
o SPAdes
o Velvet
• Measure compute time/memory req’d.
• Compare assembly results to “known” answer
with Quast.
Recovery, by assembler
Velvet IDBA Spades
Quality Quality Quality
Total length (>= 0 bp) 1.6E+08 2.0E+08 2.0E+08
Total length (>= 1000 bp) 1.6E+08 1.9E+08 1.9E+08
Largest contig 561,449 979,948 1,387,918
# misassembled contigs 631 1032 752
Genome fraction (%) 72.949 90.969 90.424
Duplication ratio 1.004 1.007 1.004
Conclusion: SPAdes and IDBA achieve similar results.
Dr. Sherine Awad
Treatments: some effect
IDBA
Quality Diginorm Partition
Total length (>= 0 bp) 2.0E+08 2.0E+08 2.0E+08
Total length (>= 1000 bp) 1.9E+08 2.0E+08 1.9E+08
Largest contig 979,948 1,469,321 551,171
# misassembled contigs 1032 916 828
Unaligned length 10,709,716 10,637,811 10,644,357
Genome fraction (%) 90.969 91.003 90.082
Duplication ratio 1.007 1.008 1.007
Conclusion: Treatments do not alter results much.
Dr. Sherine Awad
Computational cost
Velvet idba Spades
Time
(h:m:s)
RAM
(gb)
Time
(h:m:s)
RAM
(gb)
Time
(h:m:s)
RAM
(gb)
Quality 60:42:52 1,594 33:53:46 129 67:02:16 400
Diginorm 6:48:46 827 6:34:24 104 15:53:10 127
Partition 4:30:36 1,156 8:30:29 93 7:54:26 129
(Run on Michigan State HPC)
Dr. Sherine Awad
Need to understand:
• What is not being assembled and why?
o Low coverage?
o Strain variation?
o Something else?
• Effects of strain variation
• Additional contigs being assembled –
contamination? Spurious assembly?
• Performance of MEGAHIT assembler (a new assembler
that is very fast but still young).
Other observations
• 90% recovery is not bad; relatively few
misassemblies, too.
• This was not a highly polymorphic community BUT it
did have several closely related strains; more
generally, we see that strains do generate
chimeras, but not different species gen’ly.
• Challenging to execute even with a
tutorial/protocol :(
But! Assembly is…
• Morally frightening: don’t you mis-assemble
sequences? NO. (Or at least, not systematically.)
• Computationally challenging: don’t you need big
computers? YES. (But that’s changing.)
• Technically tricky: don’t you need to be an expert?
UNFORTUNATELY STILL YES BUT THERE’S HOPE.
Benchmarking &
protocols
• Our work is completely reproducible and open.
• You can re-run our benchmarks yourself if you want!
• We will be adding new assemblers in as time
permits.
• Protocol is open, versioned, citable… but also still a
work in progress :)
Using shotgun sequence to cross-
validate amplicon predictions
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
35.00%
40.00%
AMP/RDP AMP/SILVA WGS/RDP WGS/SILVA WGS/SILVA(LSU)
Amplicon seq missing Verrucomicrobia
Jaron Guo
Primer bias against
Verrucomicrobia
Check taxonomy of reads causing
mismatch (A)
Verrucomicrobia cause
70% (117/168) of
mismatch
Current primers are not effective at amplifying
Verrucomicrobia
Jaron Guo
Thanks!
Please contact me at ctbrown@ucdavis.edu!
Everything I talked about is freely available.
Search for ‘khmer protocols’.

Contenu connexe

Tendances

2015 aem-grs-keynote
2015 aem-grs-keynote2015 aem-grs-keynote
2015 aem-grs-keynotec.titus.brown
 
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...QIAGEN
 
2014 anu-canberra-streaming
2014 anu-canberra-streaming2014 anu-canberra-streaming
2014 anu-canberra-streamingc.titus.brown
 
2013 talk at TGAC, November 4
2013 talk at TGAC, November 42013 talk at TGAC, November 4
2013 talk at TGAC, November 4c.titus.brown
 
Computational infrastructure for NGS data analysis
Computational infrastructure for NGS data analysisComputational infrastructure for NGS data analysis
Computational infrastructure for NGS data analysiscursoNGS
 
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...Jan Aerts
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Prof. Wim Van Criekinge
 
NGx Sequencing 101-platforms
NGx Sequencing 101-platformsNGx Sequencing 101-platforms
NGx Sequencing 101-platformsAllSeq
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...Denis C. Bauer
 
How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)James Hadfield
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisUniversity of California, Davis
 
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...Torsten Seemann
 
Assembling NGS Data - IMB Winter School - 3 July 2012
Assembling NGS Data - IMB Winter School - 3 July 2012Assembling NGS Data - IMB Winter School - 3 July 2012
Assembling NGS Data - IMB Winter School - 3 July 2012Torsten Seemann
 
Reproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and AndurilReproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and AndurilChristian Frech
 

Tendances (20)

2015 aem-grs-keynote
2015 aem-grs-keynote2015 aem-grs-keynote
2015 aem-grs-keynote
 
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
 
2014 anu-canberra-streaming
2014 anu-canberra-streaming2014 anu-canberra-streaming
2014 anu-canberra-streaming
 
Exome Sequencing
Exome SequencingExome Sequencing
Exome Sequencing
 
2013 talk at TGAC, November 4
2013 talk at TGAC, November 42013 talk at TGAC, November 4
2013 talk at TGAC, November 4
 
Computational infrastructure for NGS data analysis
Computational infrastructure for NGS data analysisComputational infrastructure for NGS data analysis
Computational infrastructure for NGS data analysis
 
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
 
Biz model for ion proton dna sequencer
Biz model for ion proton dna sequencerBiz model for ion proton dna sequencer
Biz model for ion proton dna sequencer
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
 
NGx Sequencing 101-platforms
NGx Sequencing 101-platformsNGx Sequencing 101-platforms
NGx Sequencing 101-platforms
 
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGSCurso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
 
2014 bangkok-talk
2014 bangkok-talk2014 bangkok-talk
2014 bangkok-talk
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...
 
How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)
 
NGS: Mapping and de novo assembly
NGS: Mapping and de novo assemblyNGS: Mapping and de novo assembly
NGS: Mapping and de novo assembly
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
 
Ngs intro_v6_public
 Ngs intro_v6_public Ngs intro_v6_public
Ngs intro_v6_public
 
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
 
Assembling NGS Data - IMB Winter School - 3 July 2012
Assembling NGS Data - IMB Winter School - 3 July 2012Assembling NGS Data - IMB Winter School - 3 July 2012
Assembling NGS Data - IMB Winter School - 3 July 2012
 
Reproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and AndurilReproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and Anduril
 

En vedette

2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorialc.titus.brown
 
Aug2015 deanna church analytical validation
Aug2015 deanna church analytical validationAug2015 deanna church analytical validation
Aug2015 deanna church analytical validationGenomeInABottle
 
Making Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and AnnotationsMaking Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and AnnotationsJoão André Carriço
 
Metagenomics sequencing
Metagenomics sequencingMetagenomics sequencing
Metagenomics sequencingcdgenomics525
 
The Global Micorbial Identifier (GMI) initiative - and its working groups
The Global Micorbial Identifier (GMI) initiative - and its working groupsThe Global Micorbial Identifier (GMI) initiative - and its working groups
The Global Micorbial Identifier (GMI) initiative - and its working groupsExternalEvents
 
Whole genome microbiology for Salmonella public health microbiology
Whole genome microbiology for Salmonella public health microbiologyWhole genome microbiology for Salmonella public health microbiology
Whole genome microbiology for Salmonella public health microbiologyPhilip Ashton
 
Genome Wide Methodologies and Future Perspectives
 Genome Wide Methodologies and Future Perspectives Genome Wide Methodologies and Future Perspectives
Genome Wide Methodologies and Future PerspectivesBrian Krueger
 
Whole Genome Sequencing (WGS): How significant is it for food safety?
Whole Genome Sequencing (WGS): How significant is it for food safety? Whole Genome Sequencing (WGS): How significant is it for food safety?
Whole Genome Sequencing (WGS): How significant is it for food safety? FAO
 
The Chills and Thrills of Whole Genome Sequencing
The Chills and Thrills of Whole Genome SequencingThe Chills and Thrills of Whole Genome Sequencing
The Chills and Thrills of Whole Genome SequencingEmiliano De Cristofaro
 
Toolbox for bacterial population analysis using NGS
Toolbox for bacterial population analysis using NGSToolbox for bacterial population analysis using NGS
Toolbox for bacterial population analysis using NGSMirko Rossi
 
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015Torsten Seemann
 
Innovative NGS Library Construction Technology
Innovative NGS Library Construction TechnologyInnovative NGS Library Construction Technology
Innovative NGS Library Construction TechnologyQIAGEN
 
DNA Sequencing from Single Cell
DNA Sequencing from Single CellDNA Sequencing from Single Cell
DNA Sequencing from Single CellQIAGEN
 
Building bioinformatics resources for the global community
Building bioinformatics resources for the global communityBuilding bioinformatics resources for the global community
Building bioinformatics resources for the global communityExternalEvents
 
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practiceAug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practiceGenomeInABottle
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...nist-spin
 
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesTools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesSurya Saha
 

En vedette (20)

2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
 
Aug2015 deanna church analytical validation
Aug2015 deanna church analytical validationAug2015 deanna church analytical validation
Aug2015 deanna church analytical validation
 
Poster ESHG
Poster ESHGPoster ESHG
Poster ESHG
 
Making Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and AnnotationsMaking Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and Annotations
 
Metagenomics sequencing
Metagenomics sequencingMetagenomics sequencing
Metagenomics sequencing
 
The Global Micorbial Identifier (GMI) initiative - and its working groups
The Global Micorbial Identifier (GMI) initiative - and its working groupsThe Global Micorbial Identifier (GMI) initiative - and its working groups
The Global Micorbial Identifier (GMI) initiative - and its working groups
 
Proposal for 2016 survey of WGS capacity in EU/EEA Member States
Proposal for 2016 survey of WGS capacity in EU/EEA Member StatesProposal for 2016 survey of WGS capacity in EU/EEA Member States
Proposal for 2016 survey of WGS capacity in EU/EEA Member States
 
Whole genome microbiology for Salmonella public health microbiology
Whole genome microbiology for Salmonella public health microbiologyWhole genome microbiology for Salmonella public health microbiology
Whole genome microbiology for Salmonella public health microbiology
 
Genome Wide Methodologies and Future Perspectives
 Genome Wide Methodologies and Future Perspectives Genome Wide Methodologies and Future Perspectives
Genome Wide Methodologies and Future Perspectives
 
Whole Genome Sequencing (WGS): How significant is it for food safety?
Whole Genome Sequencing (WGS): How significant is it for food safety? Whole Genome Sequencing (WGS): How significant is it for food safety?
Whole Genome Sequencing (WGS): How significant is it for food safety?
 
The Chills and Thrills of Whole Genome Sequencing
The Chills and Thrills of Whole Genome SequencingThe Chills and Thrills of Whole Genome Sequencing
The Chills and Thrills of Whole Genome Sequencing
 
Toolbox for bacterial population analysis using NGS
Toolbox for bacterial population analysis using NGSToolbox for bacterial population analysis using NGS
Toolbox for bacterial population analysis using NGS
 
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
 
Innovative NGS Library Construction Technology
Innovative NGS Library Construction TechnologyInnovative NGS Library Construction Technology
Innovative NGS Library Construction Technology
 
DNA Sequencing from Single Cell
DNA Sequencing from Single CellDNA Sequencing from Single Cell
DNA Sequencing from Single Cell
 
Building bioinformatics resources for the global community
Building bioinformatics resources for the global communityBuilding bioinformatics resources for the global community
Building bioinformatics resources for the global community
 
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practiceAug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
 
EU PathoNGenTraceConsortium:cgMLST Evolvement and Challenges for Harmonization
EU PathoNGenTraceConsortium:cgMLST Evolvement and Challenges for HarmonizationEU PathoNGenTraceConsortium:cgMLST Evolvement and Challenges for Harmonization
EU PathoNGenTraceConsortium:cgMLST Evolvement and Challenges for Harmonization
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
 
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesTools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
 

Similaire à 2015 pag-metagenome

Scaling metagenome assembly
Scaling metagenome assemblyScaling metagenome assembly
Scaling metagenome assemblyc.titus.brown
 
2013 hmp-assembly-webinar
2013 hmp-assembly-webinar2013 hmp-assembly-webinar
2013 hmp-assembly-webinarc.titus.brown
 
2014 manchester-reproducibility
2014 manchester-reproducibility2014 manchester-reproducibility
2014 manchester-reproducibilityc.titus.brown
 
2015 vancouver-vanbug
2015 vancouver-vanbug2015 vancouver-vanbug
2015 vancouver-vanbugc.titus.brown
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibilityc.titus.brown
 
HPCAC - the state of bioinformatics in 2017
HPCAC - the state of bioinformatics in 2017HPCAC - the state of bioinformatics in 2017
HPCAC - the state of bioinformatics in 2017philippbayer
 
Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Maarten Smeets
 
40 Years of Genome Assembly: Are We Done Yet?
40 Years of Genome Assembly: Are We Done Yet?40 Years of Genome Assembly: Are We Done Yet?
40 Years of Genome Assembly: Are We Done Yet?Adam Phillippy
 
20211119 ntuh azure hpc workshop final
20211119 ntuh azure hpc workshop final20211119 ntuh azure hpc workshop final
20211119 ntuh azure hpc workshop finalMeng-Ru (Raymond) Tsai
 
University Course Timetabling by using Multi Objective Genetic Algortihms
University Course Timetabling by using Multi Objective Genetic AlgortihmsUniversity Course Timetabling by using Multi Objective Genetic Algortihms
University Course Timetabling by using Multi Objective Genetic AlgortihmsHalil Kaşkavalcı
 
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairIt Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairClaire Le Goues
 
The deep bootstrap 논문 리뷰
The deep bootstrap 논문 리뷰The deep bootstrap 논문 리뷰
The deep bootstrap 논문 리뷰Seonghoon Jung
 
To bag, or to boost? A question of balance
To bag, or to boost? A question of balanceTo bag, or to boost? A question of balance
To bag, or to boost? A question of balanceAlex Henderson
 
20110524zurichngs 2nd pub
20110524zurichngs 2nd pub20110524zurichngs 2nd pub
20110524zurichngs 2nd pubsesejun
 
Understanding Deep Learning Requires Rethinking Generalization
Understanding Deep Learning Requires Rethinking GeneralizationUnderstanding Deep Learning Requires Rethinking Generalization
Understanding Deep Learning Requires Rethinking GeneralizationAhmet Kuzubaşlı
 
Lecture on the annotation of transposable elements
Lecture on the annotation of transposable elementsLecture on the annotation of transposable elements
Lecture on the annotation of transposable elementsfmaumus
 

Similaire à 2015 pag-metagenome (20)

2015 osu-metagenome
2015 osu-metagenome2015 osu-metagenome
2015 osu-metagenome
 
2014 toronto-torbug
2014 toronto-torbug2014 toronto-torbug
2014 toronto-torbug
 
Scaling metagenome assembly
Scaling metagenome assemblyScaling metagenome assembly
Scaling metagenome assembly
 
2013 hmp-assembly-webinar
2013 hmp-assembly-webinar2013 hmp-assembly-webinar
2013 hmp-assembly-webinar
 
2012 stamps-mbl-1
2012 stamps-mbl-12012 stamps-mbl-1
2012 stamps-mbl-1
 
Ashg2014 grc workshop_schneider
Ashg2014 grc workshop_schneiderAshg2014 grc workshop_schneider
Ashg2014 grc workshop_schneider
 
2014 manchester-reproducibility
2014 manchester-reproducibility2014 manchester-reproducibility
2014 manchester-reproducibility
 
2015 vancouver-vanbug
2015 vancouver-vanbug2015 vancouver-vanbug
2015 vancouver-vanbug
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibility
 
HPCAC - the state of bioinformatics in 2017
HPCAC - the state of bioinformatics in 2017HPCAC - the state of bioinformatics in 2017
HPCAC - the state of bioinformatics in 2017
 
Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!
 
40 Years of Genome Assembly: Are We Done Yet?
40 Years of Genome Assembly: Are We Done Yet?40 Years of Genome Assembly: Are We Done Yet?
40 Years of Genome Assembly: Are We Done Yet?
 
20211119 ntuh azure hpc workshop final
20211119 ntuh azure hpc workshop final20211119 ntuh azure hpc workshop final
20211119 ntuh azure hpc workshop final
 
University Course Timetabling by using Multi Objective Genetic Algortihms
University Course Timetabling by using Multi Objective Genetic AlgortihmsUniversity Course Timetabling by using Multi Objective Genetic Algortihms
University Course Timetabling by using Multi Objective Genetic Algortihms
 
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairIt Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
 
The deep bootstrap 논문 리뷰
The deep bootstrap 논문 리뷰The deep bootstrap 논문 리뷰
The deep bootstrap 논문 리뷰
 
To bag, or to boost? A question of balance
To bag, or to boost? A question of balanceTo bag, or to boost? A question of balance
To bag, or to boost? A question of balance
 
20110524zurichngs 2nd pub
20110524zurichngs 2nd pub20110524zurichngs 2nd pub
20110524zurichngs 2nd pub
 
Understanding Deep Learning Requires Rethinking Generalization
Understanding Deep Learning Requires Rethinking GeneralizationUnderstanding Deep Learning Requires Rethinking Generalization
Understanding Deep Learning Requires Rethinking Generalization
 
Lecture on the annotation of transposable elements
Lecture on the annotation of transposable elementsLecture on the annotation of transposable elements
Lecture on the annotation of transposable elements
 

Plus de c.titus.brown

Plus de c.titus.brown (20)

2016 bergen-sars
2016 bergen-sars2016 bergen-sars
2016 bergen-sars
 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
 
2016 davis-biotech
2016 davis-biotech2016 davis-biotech
2016 davis-biotech
 
2015 msu-code-review
2015 msu-code-review2015 msu-code-review
2015 msu-code-review
 
2015 illinois-talk
2015 illinois-talk2015 illinois-talk
2015 illinois-talk
 
2015 mcgill-talk
2015 mcgill-talk2015 mcgill-talk
2015 mcgill-talk
 
2015 pycon-talk
2015 pycon-talk2015 pycon-talk
2015 pycon-talk
 
2015 opencon-webcast
2015 opencon-webcast2015 opencon-webcast
2015 opencon-webcast
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformatics
 
2015 pag-chicken
2015 pag-chicken2015 pag-chicken
2015 pag-chicken
 
2014 nyu-bio-talk
2014 nyu-bio-talk2014 nyu-bio-talk
2014 nyu-bio-talk
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
 
2014 abic-talk
2014 abic-talk2014 abic-talk
2014 abic-talk
 
2014 mmg-talk
2014 mmg-talk2014 mmg-talk
2014 mmg-talk
 
2014 nci-edrn
2014 nci-edrn2014 nci-edrn
2014 nci-edrn
 
2014 wcgalp
2014 wcgalp2014 wcgalp
2014 wcgalp
 
2014 moore-ddd
2014 moore-ddd2014 moore-ddd
2014 moore-ddd
 
2014 ismb-extra-slides
2014 ismb-extra-slides2014 ismb-extra-slides
2014 ismb-extra-slides
 
2014 bosc-keynote
2014 bosc-keynote2014 bosc-keynote
2014 bosc-keynote
 
2014 marine-microbes-grc
2014 marine-microbes-grc2014 marine-microbes-grc
2014 marine-microbes-grc
 

Dernier

The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 

Dernier (20)

Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 

2015 pag-metagenome

  • 1. Reconstructing metagenomes from shotgun data C. Titus Brown UC Davis / School of Veterinary Medicine ctbrown@ucdavis.edu
  • 2. Shotgun metagenomics • Collect samples; • Extract DNA; • Feed into sequencer; • Computationally analyze. Wikipedia: Environmental shotgun sequencing.png
  • 3. To assemble, or not to assemble? Goals: reconstruct phylogenetic content and predict functional potential of ensemble. • Should we analyze short reads directly? OR • Do we assemble short reads into longer contigs first, and then analyze the contigs?
  • 4. Assembly: good. Howe et al., 2014 Assemblies yield much more significant homology matches.
  • 5. But! Assembly is… • Morally frightening: don’t you mis-assemble sequences? • Computationally challenging: don’t you need big computers? • Technically tricky: don’t you need to be an expert?
  • 6. Or… is it? • Most assembly papers analyze novel data sets and then have to argue that their result is ok (guilty!) • Very few assembly benchmarks have been done. • Even fewer (trustworthy) computational time/memory comparisons have been done. • And even fewer “assembly recipes” have been written down clearly.
  • 7. A neat paper: Shakya et al., 2013; pmid 23387867
  • 8. A mock community! • ~60 genomes, all sequenced; • Lab mixed with 10:1 ratio of most abundant to least abundant; • 2x101 reads, 107 mn reads total (Illumina); • 10.5 Gbp of sequence in toto. • The paper also compared16s primer sets & 454 shotgun metagenome data => reconstruction. Shakya et al., 2013; pmid 23387867
  • 9. Paper conclusions • “Metagenomic sequencing outperformed most SSU rRNA gene primer sets used in this study.” • “The Illumina short reads provided a very good estimates of taxonomic distribution above the species level, with only a two- to threefold overestimation of the actual number of genera and orders.” • “For the 454 data … the use of the default parameters severely overestimated higher level diversity (~ 20- fold for bacterial genera and identified > 100 spurious eukaryotes).” Shakya et al., 2013; pmid 23387867
  • 10. How about assembly?? • Shakya et al. did not do assembly; no standard for analysis at the time, not experts. • But we work on assembly! • And we’ve been working on a tutorial/process for doing it!
  • 11. Adapter trim & quality filter Diginorm to C=10 Trim high- coverage reads at low-abundance k-mers Diginorm to C=5 Partition graph Split into "groups" Reinflate groups (optional Assemble!!! Map reads to assembly Too big to assemble? Small enough to assemble? Annotate contigs with abundances MG-RAST, etc. The Kalamazoo Metagenomics Protocol Derived from approach used in Howe et al., 2014
  • 13. Adapter trim & quality filter Diginorm to C=10 Trim high- coverage reads at low-abundance k-mers Diginorm to C=5 Partition graph Split into "groups" Reinflate groups (optional Assemble!!! Map reads to assembly Too big to assemble? Small enough to assemble? Annotate contigs with abundances MG-RAST, etc. The Kalamazoo Metagenomics Protocol => benchmarking! Assemble with Velvet, IDBA, SPAdes
  • 14. Benchmarking process • Apply various filtering treatments to the data (x3) o Basic quality trimming and filtering o + digital normalization o + partitioning • Apply different assemblers to the data for each treatment (x3) o IDBA o SPAdes o Velvet • Measure compute time/memory req’d. • Compare assembly results to “known” answer with Quast.
  • 15. Recovery, by assembler Velvet IDBA Spades Quality Quality Quality Total length (>= 0 bp) 1.6E+08 2.0E+08 2.0E+08 Total length (>= 1000 bp) 1.6E+08 1.9E+08 1.9E+08 Largest contig 561,449 979,948 1,387,918 # misassembled contigs 631 1032 752 Genome fraction (%) 72.949 90.969 90.424 Duplication ratio 1.004 1.007 1.004 Conclusion: SPAdes and IDBA achieve similar results. Dr. Sherine Awad
  • 16. Treatments: some effect IDBA Quality Diginorm Partition Total length (>= 0 bp) 2.0E+08 2.0E+08 2.0E+08 Total length (>= 1000 bp) 1.9E+08 2.0E+08 1.9E+08 Largest contig 979,948 1,469,321 551,171 # misassembled contigs 1032 916 828 Unaligned length 10,709,716 10,637,811 10,644,357 Genome fraction (%) 90.969 91.003 90.082 Duplication ratio 1.007 1.008 1.007 Conclusion: Treatments do not alter results much. Dr. Sherine Awad
  • 17. Computational cost Velvet idba Spades Time (h:m:s) RAM (gb) Time (h:m:s) RAM (gb) Time (h:m:s) RAM (gb) Quality 60:42:52 1,594 33:53:46 129 67:02:16 400 Diginorm 6:48:46 827 6:34:24 104 15:53:10 127 Partition 4:30:36 1,156 8:30:29 93 7:54:26 129 (Run on Michigan State HPC) Dr. Sherine Awad
  • 18. Need to understand: • What is not being assembled and why? o Low coverage? o Strain variation? o Something else? • Effects of strain variation • Additional contigs being assembled – contamination? Spurious assembly? • Performance of MEGAHIT assembler (a new assembler that is very fast but still young).
  • 19. Other observations • 90% recovery is not bad; relatively few misassemblies, too. • This was not a highly polymorphic community BUT it did have several closely related strains; more generally, we see that strains do generate chimeras, but not different species gen’ly. • Challenging to execute even with a tutorial/protocol :(
  • 20. But! Assembly is… • Morally frightening: don’t you mis-assemble sequences? NO. (Or at least, not systematically.) • Computationally challenging: don’t you need big computers? YES. (But that’s changing.) • Technically tricky: don’t you need to be an expert? UNFORTUNATELY STILL YES BUT THERE’S HOPE.
  • 21. Benchmarking & protocols • Our work is completely reproducible and open. • You can re-run our benchmarks yourself if you want! • We will be adding new assemblers in as time permits. • Protocol is open, versioned, citable… but also still a work in progress :)
  • 22. Using shotgun sequence to cross- validate amplicon predictions 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% 40.00% AMP/RDP AMP/SILVA WGS/RDP WGS/SILVA WGS/SILVA(LSU) Amplicon seq missing Verrucomicrobia Jaron Guo
  • 23. Primer bias against Verrucomicrobia Check taxonomy of reads causing mismatch (A) Verrucomicrobia cause 70% (117/168) of mismatch Current primers are not effective at amplifying Verrucomicrobia Jaron Guo
  • 24. Thanks! Please contact me at ctbrown@ucdavis.edu! Everything I talked about is freely available. Search for ‘khmer protocols’.

Notes de l'éditeur

  1. JGI v6, 454 amplicon sequencing