9. Running QIIME
Native installation on OS X
or Linux (laptops through
16,416-core compute
cluster*)
Ubuntu Linux Virtual Box
Amazon Web Services
(EC2)
* http://ncar.janus.rc.colorado.edu/
11. Moving Pictures of the Human
Microbiome
• Two subjects sampled daily, one for six
months, one for 18 months
• Four body sites: tongue, palm of left
hand, palm of right hand, and gut (via fecal
swabs).
12. Moving Pictures of the Human
Microbiome
• Investigate the relative temporal variability of
body sites.
• Is there a temporal core microbiome?
• Technical points: do we observe the same
conclusions on 454 and Illumina data?
13. Moving Pictures of the Human
Microbiome: QIIME tutorial
• A small subset of the full data set to facilitate
short run time: ~0.1% of the full sequence
collection.
• Sequenced across six Illumina GAIIx
lanes, with a subset of the samples also
sequenced on 454.
• The online tutorial contains details on all of
the steps: go back and read that text.
14. Key QIIME files
• Mapping file: per sample meta-data, user-
defined
• Input sequence file
• OTU table: sample x OTU matrix, central to
downstream analyses [now in biom format]
• Parameters file: defines analyses, for use
with the ‘workflow’ scripts (optional)
21. OTU table
(classic format)
sample x OTU matrix
22. OTU table
(classic format)
sample x OTU matrix
OTU identifiers
23. OTU table
(classic format)
sample x OTU matrix
Sample identifiers
24. OTU table
(classic format)
sample x OTU matrix
Optional per OTU taxonomic information
25. OTU tables are now in biological observation
matrix (.biom) format
(QIIME 1.4.0-dev and later)
Google: “biom format”
http://biom-format.org
See convert_biom.py
for translating between classic and biom otu tables
30. The Biological Observation Matrix (BIOM) Format
or: How I Learned To Stop Worrying and
Love the Ome-ome
JSON-based format for
representing arbitrary
sample x observation
contingency tables with
optional metadata
McDonald et al., GigaScience (2012).
http://www.biom-format.org
32. Working with OTU tables
• single_rarefaction.py: even sampling (very important if you
have different numbers of seqs/sample!)
• filter_otus_from_otu_table.py
• filter_samples_from_otu_table.py
• per_library_stats.py
34. OTU picking
• De Novo
– Reads are clustered based on similarity to one
another.
• Reference-based
– Closed reference: any reads which don’t hit a
reference sequence are discarded
– Open reference: any reads which don’t hit a
reference sequence are clustered de novo
35. De novo OTU picking
• Pros
– All reads are clustered
• Cons
– Not parallelizable
– OTUs may be defined by erroneous reads
36. Closed-reference OTU picking
• Pros
– Built-in quality filter
– Easily parallelizable
– OTUs are defined by high-quality, trusted
sequences
• Cons
– Reads that don’t hit reference dataset are
excluded, so you can never observe new OTUs
38. Open-reference OTU picking
• Pros
– All reads are clustered
– Partially parallelizable
• Cons
– Only partially parallelizable
– Mix of high quality sequences defining OTUs
(i.e., the database sequences) and possible low
quality sequences defining OTUs (i.e., the
sequencing reads)
40. Variation in sampling depth is an
important consideration
Human skin, colored
by individual, at 500
sequence/sample
Image/analysis credit: Justin Kuczynski
Data reference:
Forensic identification using skin bacterial communities. Fierer N, Lauber CL, Zhou N, McDonald D, Costello EK, Knight R.
Proc Natl Acad Sci U S A. 2010 Apr 6;107(14):6477-81.
41. Variation in sampling depth is an
important consideration
Human skin, colored by
sampling depth, at
either 50 or 500
sequences/sample
Image/analysis credit: Justin Kuczynski
Data reference:
Forensic identification using skin bacterial communities. Fierer N, Lauber CL, Zhou N, McDonald D, Costello EK, Knight R.
Proc Natl Acad Sci U S A. 2010 Apr 6;107(14):6477-81.
42. Variation in sampling depth is an
important consideration
Human skin, colored by
sampling depth, at
either 50 (blue) or 500
(red) sequences/sample
Image/analysis credit: Justin Kuczynski
Data reference:
Forensic identification using skin bacterial communities. Fierer N, Lauber CL, Zhou N, McDonald D, Costello EK, Knight R.
Proc Natl Acad Sci U S A. 2010 Apr 6;107(14):6477-81.
43. How deep is deep enough?
It depends on the question…
– Differences between community types: not many
sequences.
– Rare biosphere: more (but be careful about
sequencing noise!)
44. How deep is deep enough?
100 sequences/sample 10 sequences/sample 1 sequence/sample
PC2 (8 .4 %)
PC2 (1 1 %)
PC2 (1 7 %)
PC1 (2 4 %)
PC1 (1 3 %)
PC1 (8 .6 %)
PC3 (9 .7 %)
PC3 (8 .1 %)
PC3 (6 .2 %)
Direct sequencing of the human microbiome readily reveals community differences.
J Kuczynski et al. Genome Biology (2011).
50. Elizabeth K. Costello, et al. Science 2009.
Bacterial Community Variation in Human Body Habitats Across Space and Time.
51.
52.
53. This work is licensed under the Creative Commons Attribution 3.0 United States License. To view a
copy of this license, visit http://creativecommons.org/licenses/by/3.0/us/ or send a letter to
Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
Feel free to use or modify these slides, but please credit me by placing the following attribution
information where you feel that it makes sense: Greg Caporaso, www.caporaso.us.