Caporaso sloan qiime_workshop_slides_18_oct2012

QIIME Workshop

Get started by opening:
http://bit.ly/mbe-qiime2012
and read up at:
www.qiime.org
Greg Caporaso
gregcaporaso@gmail.com

www.qiime.org
Extract DNA and amplify
marker gene with
barcoded primers Pool amplicons and sequence

RefSeq 1
>GCACCTGAGGACAGGCATGAGGAA…
>GCACCTGAGGACAGGGGAGGAGGA… RefSeq 2

>TCACATGAACCTAGGCAGGACGAA… RefSeq 3
RefSeq 4
>CTACCGGAGGACAGGCATGAGGAT…
>TCACATGAACCTAGGCAGGAGGAA… RefSeq 5
RefSeq 6
>GCACCTGAGGACACGCAGGACGAC…
>CTACCGGAGGACAGGCAGGAGGAA… RefSeq 7
>CTACCGGAGGACACACAGGAGGAA… RefSeq 8
RefSeq 9
>GAACCTTCACATAGGCAGGAGGAT…
>TCACATGAACCTAGGGGCAAGGAA… RefSeq 10

>GCACCTGAGGACAGGCAGGAGGAA…
Assign millions of Compute UniFrac distances
Assign reads to samples sequences from thousands and compare samples
of samples to OTUs

>5000 samples in analysis pipeline
• Stream and lake water
• Marine water, sediment and reef
• Soil (forest, farm, peatland, tundra, …)
• Air
• Coalbed
• Arctic ice core
• Insect-associated
• Human-associated (gut, mouth, skin)

http://www.earthmicrobiome.org/

>5000 samples analyzed
to date

Alpha diversity by environment type

Where do we look for new diversity?

* As determined by no hit to Greengenes database.

Sequencing output
Metadata
(454, Illumina, Sanger)

fastq, fasta, qual, or sff/trace files
mapping file www.QIIME.org
Phylogenetic Tree
OTU (or other sample by
Pre-processing observation) table
Evolutionary relationship
e.g., remove primer(s), demultiplex,
between OTUs
quality filter

Denoise 454 Data Database Submission α-diversity and rarefaction β-diversity and rarefaction
PyroNoise, Denoiser e.g., Phylogenetic e.g., Weighted and
(In development)
Diversity, Chao1, unweighted UniFrac, Bray-
Observed Species Curtis, Jaccard

Pick OTUs and representative sequences
Reference based De novo Interactive visualizations
BLAST, UCLUST, e.g., UCLUST, CD-HIT,
USEARCH MOTHUR, USEARCH
e.g., PCoA plots, distance histograms, taxonomy charts, rarefaction
plots, network visualization, jackknifed hierarchical clustering.

Assign taxonomy Align sequences
e.g., PyNAST,
Legend
BLAST, RDP Currently supported for
INFERNAL, MUSCLE, Currently supported for
Classifier general sample by
MAFFT marker-gene data only
observation data
(i.e., 'upstream' step) (i.e., 'downstream' step)
Build 'OTU table' Build phylogenetic tree
i.e., sample by observation e.g., FastTree, RAxML, Required step or input Optional step or input
matrix ClearCut

Running QIIME
Native installation on OS X
or Linux (laptops through
16,416-core compute
cluster*)

Ubuntu Linux Virtual Box

Amazon Web Services
(EC2)

* http://ncar.janus.rc.colorado.edu/

Moving Pictures of the Human
Microbiome
• Two subjects sampled daily, one for six
months, one for 18 months
• Four body sites: tongue, palm of left
hand, palm of right hand, and gut (via fecal
swabs).

Microbiome
• Investigate the relative temporal variability of
body sites.
• Is there a temporal core microbiome?
• Technical points: do we observe the same
conclusions on 454 and Illumina data?

Microbiome: QIIME tutorial
• A small subset of the full data set to facilitate
short run time: ~0.1% of the full sequence
collection.
• Sequenced across six Illumina GAIIx
lanes, with a subset of the samples also
sequenced on 454.
• The online tutorial contains details on all of
the steps: go back and read that text.

Key QIIME files

• Mapping file: per sample meta-data, user-
defined
• Input sequence file
• OTU table: sample x OTU matrix, central to
downstream analyses [now in biom format]
• Parameters file: defines analyses, for use
with the ‘workflow’ scripts (optional)

Mapping file: always run
check_id_map.py

= required field

>[sampleID_seqID] description

Barcodes have been removed!!

Sequences file: can be user-provided, or
generated by split_libraries.py

OTU table
(classic format)
sample x OTU matrix

OTU table
(classic format)
sample x OTU matrix

OTU identifiers

OTU table
(classic format)
sample x OTU matrix

Sample identifiers

OTU table
(classic format)
sample x OTU matrix

Optional per OTU taxonomic information

OTU tables are now in biological observation
matrix (.biom) format
(QIIME 1.4.0-dev and later)
Google: “biom format”

http://biom-format.org

See convert_biom.py
for translating between classic and biom otu tables

sample x observation contingency matrix
Samples

OTUs

Observation
counts


Samples

Taxa

Observation
counts

Metagenomes

Functions

Observation
counts

Samples Genomes Samples
OTUs Ortholog Taxa
groups
Marker Comparative Marker
gene (e.g., 16S) genomics gene (e.g., 16S)
surveys surveys

Samples
Metagenomes

Functions Metabolites

Metagenomics

Metatranscriptomics
Metabolomics
...

The Biological Observation Matrix (BIOM) Format
or: How I Learned To Stop Worrying and
Love the Ome-ome

JSON-based format for
representing arbitrary
sample x observation
contingency tables with
optional metadata

McDonald et al., GigaScience (2012).
http://www.biom-format.org

Comparative genomic (B) and metagenome
analysis (C) with QIIME

Working with OTU tables
• single_rarefaction.py: even sampling (very important if you
have different numbers of seqs/sample!)
• filter_otus_from_otu_table.py
• filter_samples_from_otu_table.py
• per_library_stats.py

OTU picking
• De Novo
– Reads are clustered based on similarity to one
another.
• Reference-based
– Closed reference: any reads which don’t hit a
reference sequence are discarded
– Open reference: any reads which don’t hit a
reference sequence are clustered de novo

De novo OTU picking
• Pros
– All reads are clustered
• Cons
– Not parallelizable
– OTUs may be defined by erroneous reads

Closed-reference OTU picking
• Pros
– Built-in quality filter
– Easily parallelizable
– OTUs are defined by high-quality, trusted
sequences
• Cons
– Reads that don’t hit reference dataset are
excluded, so you can never observe new OTUs

Percentage of reads
that do not hit the
reference
collection, by
environment type.

Open-reference OTU picking
• Pros
– All reads are clustered
– Partially parallelizable
• Cons
– Only partially parallelizable
– Mix of high quality sequences defining OTUs
(i.e., the database sequences) and possible low
quality sequences defining OTUs (i.e., the
sequencing reads)

Variation in sampling depth is an
important consideration

Human skin, colored
by individual, at 500
sequence/sample

Image/analysis credit: Justin Kuczynski

Data reference:
Forensic identification using skin bacterial communities. Fierer N, Lauber CL, Zhou N, McDonald D, Costello EK, Knight R.
Proc Natl Acad Sci U S A. 2010 Apr 6;107(14):6477-81.


Human skin, colored by
sampling depth, at
either 50 or 500
sequences/sample

Data reference:


Human skin, colored by
sampling depth, at
either 50 (blue) or 500
(red) sequences/sample

Data reference:

How deep is deep enough?
It depends on the question…
– Differences between community types: not many
sequences.
– Rare biosphere: more (but be careful about
sequencing noise!)

How deep is deep enough?

100 sequences/sample 10 sequences/sample 1 sequence/sample
PC2 (8 .4 %)

PC2 (1 1 %)
PC2 (1 7 %)

PC1 (2 4 %)

PC1 (1 3 %)
PC1 (8 .6 %)
PC3 (9 .7 %)

PC3 (8 .1 %)

PC3 (6 .2 %)

Direct sequencing of the human microbiome readily reveals community differences.
J Kuczynski et al. Genome Biology (2011).

Figure 1
(A) (B)

10
100

1

(C)

Can we get accurate taxonomic
assignment from short reads?

Elizabeth K. Costello, et al. Science 2009.
Bacterial Community Variation in Human Body Habitats Across Space and Time.

This work is licensed under the Creative Commons Attribution 3.0 United States License. To view a
copy of this license, visit http://creativecommons.org/licenses/by/3.0/us/ or send a letter to
Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.

Feel free to use or modify these slides, but please credit me by placing the following attribution
information where you feel that it makes sense: Greg Caporaso, www.caporaso.us.

Caporaso sloan qiime_workshop_slides_18_oct2012

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Caporaso sloan qiime_workshop_slides_18_oct2012

Similaire à Caporaso sloan qiime_workshop_slides_18_oct2012 (20)

Dernier

Dernier (20)

Caporaso sloan qiime_workshop_slides_18_oct2012