This document describes PICRUSt, a computational approach for predicting metagenome functional content from 16S rRNA gene surveys. PICRUSt uses ancestral state reconstruction to infer the presence or absence of gene families or functional traits in unobserved genomes based on the phylogenetic relationships between observed microbes and their genomic content. The document evaluates PICRUSt's accuracy on predicting metagenomes from mock communities and human microbiome project samples, finding good prediction across various body sites. Potential applications of PICRUSt include generating hypotheses about microbial community functions and aiding other metagenomic analyses.
2. 16S rRNA gene
Standard marker gene for bacterial and
archaeal species identification
Recent widespread use in metagenomic
microbiome surveys
Limited to telling us: “who is there?”
3. Using 16S anonymously
16S reads often clustered into OTUs
Alpha diversity
Beta diversity
Rarefaction
Biogeography
Bik et al., 2012
4. What is in a name?
Real names vs OTU1234
Lee et al. 2010
5. What is in a name?
Real names vs OTU1234
Haloferax
Lee et al. 2010
6. What is in a name?
Real names vs OTU1234
Haloferax
Lee et al. 2010
Prochlorococcus
7. What is in a name?
Real names vs OTU1234
Haloferax
Lee et al. 2010
Prochlorococcus
Bacillus
8. Extending 16S to functions
Metagenomics: “What are they doing?”
Requires WGS sequencing
More costly
Use microbial databases
~3500 genomes
• KEGG
• 16S gene IMG • PFAM
• Or Other Functional • EC
Find genome
Marker Gene Information • SEED
NCBI
• Etc.
Etc.
9. PICRUST
Phylogenetic Investigation of
Communities by Reconstruction of
Unobserved STates
http://picrust.sourceforge.net
10. PICRUST: Predicting genomes
Reference 16S Genome Trait
Tree Table
(Green Genes) (e.g. KEGG, 16S
copy number)
Prune taxa with
no genome
information
Infer Predict
ancestral genome
genome traits compositions
11. PICRUST: Predicting metagenomes
16S Copy Number Functional Trait
Predictions Predictions
(per genome) (per genome)
OTU Table Predict Metagenome Functions by
Normalize OTU Table Sample
(16S by Sample) Functional Traits
12. Ancestral State Reconstruction
Needs to accept continuous data
Must run fast! (8000 traits across 3500
genomes)
Wagner Parsimony (Count software; Csuos, 2010)
ACE (APE R Library; Paradis, 2004)
PIC
ML
REML
13. Accuracy for metagenome prediction
1. Obtain metagenomic projects with both
WGS and 16S only sequencing
2. Make functional predictions using
PICRUST with 16S only data
3. Compare predictions with WGS data
14. ASR methods on metagenomics
Wagner Parsimony ACE PIC
HMP Mock R2= 0.92 R2= 0.91
Community
(known
organisms
sequenced)
All methods
give similar ACE REML ACE ML
results except R2= 0.92 R2= 0.72
for “ACE ML”
known problem
and recently
added “REML”
method solves
problem
17. Accuracy for genome prediction
1. Pretend a genome has not been sequenced
2. Predict genome composition using PICRUST
3. Compare predictions to real data
4. Repeat for all genomes
21. Possible applications
1. 16S only microbiome studies
Make hypotheses about the functions they encode
2. Complete metagenomic studies
Compare functions we “observe” to what we would expect
based on species present
3. Aid other metagenomic computational methods
Binning
Metabolic reconstruction
4. Insight into correlation between species & function
For different taxonomic groups
For different functional classes
22. Acknowledgements
Rob Beiko
Curtis Huttenhower
Rob Knight
Jesse Zaneveld
Greg Caporaso
Joshua Reyes
Dan Knights
Daniel McDonald