1. Next-generational sequencing for
microbial ecology:
alpha diversity, beta diversity, and
biases in high-throughput sequencing
Rachel Adams
Andrew Rominger
Sara Branco
Thomas Bruns
2. Understudied but fundamental ecological
habitat
Implications for human health
Sick building syndrome
Metrics are practically absent: composition and
quantitative characteristics
Need comparison of “typical” buildings
The microbiome of the built environment
3. Understudied but fundamental ecological
habitat
Implications for human health
Sick building syndrome
Metrics are practically absent: composition and
quantitative characteristics
Need comparison of “typical” buildings and high
replication across settings to detect patterns
The microbiome of the built environment
25. True community
Observed community
β12 β13
β23
β12? β13?
β23?
α1
α2 α3
α1+
α2+ α3+
...but how does that change
conclusions of ecological processes
that are based on β diversity?
High-throughputsequencing
26. A key component to community ecology: Linking
processes to this compositional variation
Adams et al., ISME Journal, 2013
Beta diversity: the variation in species composition
among sites
27. Do errors that inflate alpha diversity bias conclusions on beta
diversity between samples?
Why would it?
• Particular taxa in one environment grouping do not amplify or
amplify in a way that skews relative abundance of all others*
• Clustering incorrectly groups divergent taxa or splits identical
taxa
Hypothesis: No
While richness/diversity estimations will be off for any given
sample, conclusions of beta-diversity will be robust to the
Question and hypotheses
28. Do errors that inflate alpha diversity bias conclusions on beta
diversity between samples?
Why would it?
• Particular taxa in one environment grouping do not amplify or
amplify in a way that skews relative abundance of all others*
• Clustering incorrectly groups divergent taxa or splits identical
taxa
Hypothesis: No
While richness/diversity estimations will be off for any given
sample, conclusions of beta-diversity will be robust to the
Question and hypotheses
29. Do errors that inflate alpha diversity bias conclusions on beta
diversity between samples?
Why would it?
• Particular taxa in one environment grouping do not amplify or
amplify in a way that skews relative abundance of all others*
• Clustering incorrectly groups divergent taxa or splits identical
taxa
While richness/diversity estimations will be off for any given
sample, conclusions of beta-diversity will be robust to the
errors
Question and hypotheses
34. Simulation process
Biased relative
abundance
Variation in taxon-
specific amplification
Biased relative
abundance + error
Sequence error
Clustering OTUs
Initial communities
Biased relative
abundance + error +
clustering
Expected relative
abundance of OTUs
35. Simulation process
Biased relative
abundance
Variation in taxon-
specific amplification
Biased relative
abundance + error
Sequence error
Biased relative
abundance + error +
clusteringClustering OTUs
Simulated communities
Initial communities
Expected relative
abundance of OTUs
36. Model summary – 2 types of errors
1. Create group differences that aren’t there (Type I error)
-0.5 0.0 0.5
-0.4-0.20.00.20.4
True
NMDS1
NMDS2
-0.5 0.0 0.5
-0.4-0.20.00.20.4
Perceived
NMDS1
NMDS2
37. Model summary – 2 types of errors
2. Loose groups differences that are there (Type II error)
-0.5 0.0 0.5
-0.4-0.20.00.20.4
True
NMDS1
NMDS2
-0.5 0.0 0.5
-0.4-0.20.00.20.4
Perceived
NMDS1
NMDS2
38. Model summary output
1. Presence of bias: Statistical categorical differences
Groups R2
p-value
Location 0.02 0.34
Season 0.20 0.001
2. Degree of bias: percentage difference between true
and simulated communities
(Simulated – True)
True
= Normalized bias
39. Model summary output
1. Presence of bias: Statistical categorical differences
2. Degree of bias: percentage difference between true
and simulated communities
(Simulated distance – True distance)
True distance
= Normalized error
Morisita-Horn distance metric
Groups R2
p-value
Location 0.02 0.34
Season 0.20 0.001
40. Categorical differences are robust to high-throughput
sequencing errors in alpha diversity, regardless of the
underlying patterns of beta-diversity
The degree of bias is not affected by the underlying
patterns of beta-diversity but dependent on
community characteristics
Model findings
41. Model findings
Categorical differences are robust to high-throughput
sequencing errors in alpha diversity, regardless of the
underlying patterns of beta-diversity
The degree of bias is not affected by the underlying
patterns of beta-diversity but dependent on
community characteristics
42. True Simulated True Simulated
0.00.20.40.60.81.0
pvalues No groups Two groups
Model summary – Type I & II error
43. True Simulated True Simulated
0.00.20.40.60.81.0
pvalues No groups Two groups
Model summary – Type I & II error
44. True Simulated True Simulated
0.00.20.40.60.81.0
pvalues No groups Two groups
Model summary – Type I & II error
Whether groups are different or the same will not be biased
by inflated alpha diversity
45. Model summary – Degree of bias
Degree of bias will be affected by
- the error rate of the platform and OTU- clustering
- the gamma diversity of the environment
- the precise shape of the species abundance
distribution
But not the relationship among samples
46. Increasing probability of sequencing error and over-
splitting OTUs increases bias
1e-04 0.0334 0.0667 0.1
0.00.10.20.30.40.50.6
No groups
Normalizederror
1e-04 0.0334 0.0667 0.1
Two groups
Probability of splitting
47. Increasing OTU richness decreases bias
100 600 1100
0.00.20.40.60.8
Number of OTUs
Normalizederror
48. Shape of species abundance distribution (SAD) affects
bias
0 200 400 600 800 1000 1200
010002000300040005000
Rank
Abundance
49. Shape of species abundance distribution (SAD) affects
bias
1.5 2.5 3.5
0.00.20.40.60.8
Increasing SAD variance
Normalizederror
50. As true community distance increases, degree of error
decreases
0.65 0.70 0.75 0.80
0.20.30.40.50.6
True distance
Normalizederror
51. Clustering is the main error-producing step
True Amplified Split
0.00.10.20.30.40.5
R^2values Two groups
52. Simulation overview
Categorical analysis very robust to errors in high-
throughput biases
Degree of bias will be affected by error rate of the
sequencing platform and OTU-clustering, the gamma
diversity of the environment, the precise shape of the
species abundance distribution
High-throughput error leads to an over-estimation of
the difference between groups
Mean bias is ~20-40%
Incorrect OTU clustering is most of that
53. Steps
1. In silico: Add further complexity to simulations
2. In vitro: Empirically test artificially-created
microbial communities
54. Do errors that inflate alpha diversity bias conclusions on beta
diversity between samples?
Why would it?
• Particular taxa in one environment grouping do not amplify or
amplify in a way that skews relative abundance of all others*
• Clustering incorrectly groups divergent taxa or splits identical
taxa
Hypothesis: No
While richness/diversity estimations will be off for any given
sample, conclusions of beta-diversity will be robust to the errors
Question and hypotheses
55. Air samples in a mycology classroom:
a unique source distorts perceived species richness
56. Air samples in a mycology classroom:
a unique source distorts perceived species richness
57. Mycology classroom appears to be less rich than other
classrooms…
0 2000 4000 6000 8000
02004006008001000
B
A
C
D
E
Individuals
ChaoEstimatedRichness
58. … but has higher biomass
A B C D E
050100150200
Classroom
Penicilliumsporeequivalents
66. Conclusions
• While deciphering alpha diversity is problematic:
- Inflated alpha due to sequence error & clustering
- Deflated alpha due to unevenness
beta diversity calculations are robust to these errors
in high-throughput sequencing
• Empirical test will be used to corroborate conclusions
of in silico simulations
• High-throughput sequencing will continue to be a
promising tool for microbial ecologists
67. Conclusions
• While deciphering alpha diversity is problematic:
- Inflated alpha due to sequence error & clustering
- Deflated alpha due to unevenness
beta diversity calculations are robust to these errors
in high-throughput sequencing
• Empirical test will be used to corroborate conclusions
of in silico simulations
• High-throughput sequencing will continue to be a
promising tool for microbial ecologists
68. Conclusions
• While deciphering alpha diversity is problematic:
- Inflated alpha due to sequence error & clustering
- Deflated alpha due to unevenness
beta diversity calculations are robust to these errors
in high-throughput sequencing
• Empirical test will be used to corroborate conclusions
of in silico simulations
• High-throughput sequencing will continue to be a
promising tool for microbial ecologists
69.
70. References – potential biases in high-throughput
sequencing
DNA extraction: Frostegard et al Appl Environ Microbiol 1999; DeSantis et al FEMS
Microbiology 2005; Feinsten et al Appl Environ Microbiol 2009; Morgan et al PLoS ONE
2010; Delmont et al Appl Environ Microbiol 2011
PCR amplification/Relative abundance: Amend et al Mol Ecol 2010; Engelbrektson et al
ISME Journal 2010; Bellemain et al BMC Microbiol 2010; Schloss et al PLoS ONE
2011; Pinto & Raskin PLoS ONE 2012; Klindworth et al Nucleic Acids Res 2013
Sequencing error/Chimeras/OTU clustering: Huse et al Genome Biol 2007; Huse et al
Environ Microbiol 2010; Kunin et al Environ Microbiol 2010; Quince et al BMC
Bioinformatics 2010; Lee et al PLoS ONE 2012; Pinto & Raskin PLoS ONE 2012;
Bachy et al ISME Journal 2013
Sequencing platform/protocol: Morgan et al PLoS ONE 2010; Luo et al PLoS ONE 2012
Even sampling depth: Schloss et al PLoS ONE 2011; Gihring et al Environ Microbiol
2012
Denoising: Gasper & Thomas PLoS ONE 2013;
71. Empirical test of simulation results
100 600 1100
0.00.20.40.60.8
Number of OTUs
Normalizederror
72. PCR bias
-0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2
0.00.51.01.52.0
PCR bias: beta distribution a=0.5, beta=1.0
Scatter around line of true abundance versus amplified abundance
Density
0 200 400 600 800 1000 1200
0200400600800100012001400
True abundance
Amplifiedabundance
73. OTU splitting bias
0 5 10 15 20
0.00.10.20.30.4
Split bias: binomial distribution with n=100
Number of splits
Density
p=0.001
p=0.0667
p=0.0334
p=0.0001
0.0 0.5 1.0
0.00.20.40.60.81.01.2
Split location: beta distribution with a=b=0.5
Location of split
Density