Presentation at a workshop conducted by the UC Davis Bioinformatics Core Facility: Using the Linux Command Line for Analysis of High Throughput Sequence Data, September 15-19, 2014
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
1. “Scientists often have a naïve
faith that if only they could
discover enough facts about a
problem, these facts would
somehow arrange themselves in
a compelling and true solution.”
Theodosius Dobzhansky
2. Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
Jenna Morgan Lang
postdoc
Jonathan Eisen’s Lab
UC Davis
email: jennomics@gmail.com
Twitter: @jennomics
websites: jennomics.com
seagrassmicrobiome.org
phylogenomics.wordpress.com
6. Typical laboratory workflow
• Extract DNA with MoBio PowerSoil Kit
• Amplify 16S rDNA with barcoded primers
• Pool samples and sequence on the MiSeq
– 15 million reads, 250bp PE
– 50-200(?) samples
– Sample drop out
7. Typical bioinformatic workflow
• Demultiplex and QC sequence data
• Process using QIIME
• Stare at graphs and wait for a revelation
8. inputs pre-processing under the hood analysis
Meta-data
Sequence
data
z
Sequence
pre-processing
Cluster
sequences
Build
OTU table
Build
phylogenetic
tree
Assign
taxonomy
Alpha
diversity
Beta
diversity
Hypothesis
testing
Data
visualization
Q
I
I
M
E
9.
10. You can do lots of things with a .biom table
produced by QIIME
• METAGENassist
• interactive web tool that will do lots of stats and make
pretty pictures
• PICRUSt (google: picrust metagenomes)
• infers functional potential based on your 16S data
• STAMP (google: stamp bioinformatics)
• flexible python tool (with a GUI) that will do statistical
analysis of taxonomic and functional profiles on the fly
• R (phyloseq package)
• If you are familiar with R, this will bridge the gap between
QIIME and Rstats
• Phinch
• Interactive web-based visualization tool
11. METAGENassist
• Input is .biom table and “mapping file”
• can input matrix of taxonomy or
functional assignments
• many options for statistical analysis
• easily generate nice plots
13. PICRUSt
(Phylogenetic Investigation of Communities by Reconstruction of Unobserved States)
• .biom table input from QIIME
• normalize by copy number
• predict metagenome
• .biom table output (with functional
categories)
Zaneveld, J.R., Lozupone, C., Gordon, J.I. & Knight, R. Ribosomal RNA diversity predicts
genome diversity in gut bacteria and their relatives. Nucleic Acids Res. 38, 3869–3879
(2010)
Martiny, A.C., Treseder, K. & Pusch, G. Phylogenetic conservatism of functional traits in
microorganisms. ISME J. 7, 830–838 (2013)
15. PICRUSt can
produce results that
make sense!
Tributary
contaminated by
old sulfur mine
Sulfur Metabolism
16. STAMP
• Input is .biom table and “mapping file”
• Can input matrix of taxonomy or
functional assignments
• powerful statistical options
• Can subsample data on the fly
• Generates OK plots
17. Using STAMP to identify SEED subsystems which are differentially abundant between
Candidatus Accumulibacter phosphatis sequences obtained from a pair of enhanced
biological phosphorus removal (EBPR) sludge metagenomes(data originally described in
Parks and Beiko, 2010).
18. phyloseq R package
• Create a phyloseq object
– .biom table
– “mapping file”
– phylogenetic tree
• google: phyloseq demo
• do stats and make plots that you can
prettify with ggplot2
27. Standardize collection, storage, and laboratory procedures
Figure 3. Predicted and observed frequencies of sequence reads from each organism.
Morgan JL, Darling AE, Eisen JA (2010) Metagenomic Sequencing of an In Vitro-Simulated Microbial Community. PLoS ONE 5(4):
e10209. doi:10.1371/journal.pone.0010209
http://www.plosone.org/article/info:doi/10.1371/journal.pone.0010209
36. Schloss
reducing
artifacts
Last Bit of Ugly Data
mock community consisting of 21 taxa
3 different regions amplified
4 different sequencing centers
Fecal sample
Image lifted from: http://www.kcdsg.org
Some very basic background on what the Eisen lab typically does.
Microbial genome sequencing and assembly – I will talk about this in more detail near the end of this presentation)
16S rDNA PCR surveys (i.e., microbial ecology) – describe what this is
Metagenomics (wholesale sequencing of environmental microbial DNA) – next slide
image lifted from http://buildanawesomebusiness.com
Metagenomic data, while richer in terms of information content, is much more complex and messy
We have developed some cool tools for analyzing metagenomic data (Phylosift)
These are elements of experimental design that people understand in the context of their daily scientific lives, but tend to forget about when designing their microbiome experiments. And, I’m not going to address each of these points, but I’m going to spend a couple of minutes showing you some scary, ugly data that should reinforce the need to keep these things in mind.
These are elements of experimental design that people understand in the context of their daily scientific lives, but tend to forget about when designing their microbiome experiments. And, I’m not going to address each of these points, but I’m going to spend a couple of minutes showing you some scary, ugly data that should reinforce the need to keep these things in mind.
These are elements of experimental design that people understand in the context of their daily scientific lives, but tend to forget about when designing their microbiome experiments. And, I’m not going to address each of these points, but I’m going to spend a couple of minutes showing you some scary, ugly data that should reinforce the need to keep these things in mind.
Venus the cat
Aliquots from a single culture were sent to three institutes where they were process with three batches of the FastDNA Spin Kit for Soil
Same lab, 4 different kits
MoBio kits had lowest taxonomic diversity, but also WAY fewer reads
Fecal sample is thought to show a different pattern because it is dominated by fewer taxa, whereas the mock community was even
So, what do we do when presented with all of this depressing data? We just keep doing our science!