Rachel Adams - SMBE Euks Meeting

Next-generational sequencing for
microbial ecology:
alpha diversity, beta diversity, and
biases in high-throughput sequencing
Rachel Adams
Andrew Rominger
Sara Branco
Thomas Bruns

Understudied but fundamental ecological
habitat
Implications for human health
Sick building syndrome
Metrics are practically absent: composition and
quantitative characteristics
Need comparison of “typical” buildings
The microbiome of the built environment

Understudied but fundamental ecological
habitat
Implications for human health
Sick building syndrome
Metrics are practically absent: composition and
quantitative characteristics
Need comparison of “typical” buildings and high
replication across settings to detect patterns
The microbiome of the built environment

?
?
?
The What and Why of the indoor microbiome

?
?
?
Architecture
Ventilation
Building function

?
?
?
Architecture
Ventilation
Building function Environmental setting

?
?
?
Architecture
Ventilation
Building function Environmental setting
Residents

Fungi in the indoor microbiome, and beyond
Yeasts
Filaments

Yeasts
Filaments
Saprobes

Yeasts Saprobes
Symbionts
Parasites Mutualists
− +

Assessing environmental fungi
1. Estimated that 5-20% of fungi grow in culture
2. Identification requires a fungal taxonomist

Assessing environmental fungi
SSU RNA (18S) (5.8S) LSU RNA (28S)
ITS1 ITS2
Nuclear ribosomal internal transcribed spacer
(ITS) region as a universal DNA barcode
marker for Fungi - Schoch et al. 2012

High-throughput sequencing has greatly expanded
capabilities in microbial ecology

ACGAGTGCGT

ACGAGTGCGT
ACGCTCGACA
AGACGCACTC
AGCACTGTAG
ATCAGACACG
104
– 107
sequence reads

α1
β12
ϒ
α2 α3
β23
β13
alpha, beta, gamma diversity

α1
α2 α3

α1
β12
α2 α3
β23
β13

Kunin et al. 2010
Groundtruthing high-throughput sequencing for
alpha richness

Kunin et al. 2010
αtrue < αest
Groundtruthing high-throughput sequencing for
alpha richness

Groundtruthing high-throughput sequencing

True samples
High-throughputsequencing
Observed samples
α1
α2 α3
α1+
α2+ α3+
In terms of diversity, we know that α
can be elevated in high-throughput
sequenced communities...

True community
Observed community
β12 β13
β23
β12? β13?
β23?
α1
α2 α3
α1+
α2+ α3+
...but how does that change
conclusions of ecological processes
that are based on β diversity?
High-throughputsequencing

A key component to community ecology: Linking
processes to this compositional variation
Adams et al., ISME Journal, 2013
Beta diversity: the variation in species composition
among sites

Do errors that inflate alpha diversity bias conclusions on beta
diversity between samples?
Why would it?
• Particular taxa in one environment grouping do not amplify or
amplify in a way that skews relative abundance of all others*
• Clustering incorrectly groups divergent taxa or splits identical
taxa
Hypothesis: No
While richness/diversity estimations will be off for any given
sample, conclusions of beta-diversity will be robust to the
Question and hypotheses

Why would it?
taxa
sample, conclusions of beta-diversity will be robust to the
errors

Simulation process
Initial community
Simulated community
OTU1 OTU2 … OTUj
Sample
1
Sample
2
…
Sample i
OTU1 OTU2 … OTUk
Sample
1
Sample
2
…

Simulation process
Expected relative
abundance of OTUs
Initial communities

Simulation process
Biased relative
abundance
Variation in taxon-
specific amplification
Initial communities
Expected relative
abundance of OTUs

Simulation process
Biased relative
abundance
Variation in taxon-
Biased relative
abundance + error
Sequence error
Initial communities
Expected relative
abundance of OTUs

Simulation process
Biased relative
abundance
Variation in taxon-
Biased relative
abundance + error
Sequence error
Clustering OTUs
Initial communities
Biased relative
abundance + error +
clustering
Expected relative
abundance of OTUs

Simulation process
Biased relative
abundance
Variation in taxon-
Biased relative
abundance + error
Sequence error
Biased relative
abundance + error +
clusteringClustering OTUs
Simulated communities
Initial communities
Expected relative
abundance of OTUs

Model summary – 2 types of errors
1. Create group differences that aren’t there (Type I error)
-0.5 0.0 0.5
-0.4-0.20.00.20.4
True
NMDS1
NMDS2
-0.5 0.0 0.5
-0.4-0.20.00.20.4
Perceived
NMDS1
NMDS2

Model summary – 2 types of errors
2. Loose groups differences that are there (Type II error)
-0.5 0.0 0.5
-0.4-0.20.00.20.4
True
NMDS1
NMDS2
-0.5 0.0 0.5
-0.4-0.20.00.20.4
Perceived
NMDS1
NMDS2

Model summary output
1. Presence of bias: Statistical categorical differences
Groups R2
p-value
Location 0.02 0.34
Season 0.20 0.001
2. Degree of bias: percentage difference between true
and simulated communities
(Simulated – True)
True
= Normalized bias

Model summary output
1. Presence of bias: Statistical categorical differences
2. Degree of bias: percentage difference between true
and simulated communities
(Simulated distance – True distance)
True distance
= Normalized error
Morisita-Horn distance metric
Groups R2
p-value
Location 0.02 0.34
Season 0.20 0.001

Categorical differences are robust to high-throughput
sequencing errors in alpha diversity, regardless of the
underlying patterns of beta-diversity
The degree of bias is not affected by the underlying
patterns of beta-diversity but dependent on
community characteristics
Model findings

Model findings
Categorical differences are robust to high-throughput
sequencing errors in alpha diversity, regardless of the
underlying patterns of beta-diversity
The degree of bias is not affected by the underlying
patterns of beta-diversity but dependent on
community characteristics

True Simulated True Simulated
0.00.20.40.60.81.0
pvalues No groups Two groups
Model summary – Type I & II error

True Simulated True Simulated
0.00.20.40.60.81.0
pvalues No groups Two groups
Model summary – Type I & II error
Whether groups are different or the same will not be biased
by inflated alpha diversity

Model summary – Degree of bias
Degree of bias will be affected by
- the error rate of the platform and OTU- clustering
- the gamma diversity of the environment
- the precise shape of the species abundance
distribution
But not the relationship among samples

Increasing probability of sequencing error and over-
splitting OTUs increases bias
1e-04 0.0334 0.0667 0.1
0.00.10.20.30.40.50.6
No groups
Normalizederror
1e-04 0.0334 0.0667 0.1
Two groups
Probability of splitting

Increasing OTU richness decreases bias
100 600 1100
0.00.20.40.60.8
Number of OTUs
Normalizederror

Shape of species abundance distribution (SAD) affects
bias
0 200 400 600 800 1000 1200
010002000300040005000
Rank
Abundance

Shape of species abundance distribution (SAD) affects
bias
1.5 2.5 3.5
0.00.20.40.60.8
Increasing SAD variance
Normalizederror

As true community distance increases, degree of error
decreases
0.65 0.70 0.75 0.80
0.20.30.40.50.6
True distance
Normalizederror

Clustering is the main error-producing step
True Amplified Split
0.00.10.20.30.40.5
R^2values Two groups

Simulation overview
Categorical analysis very robust to errors in high-
throughput biases
Degree of bias will be affected by error rate of the
sequencing platform and OTU-clustering, the gamma
diversity of the environment, the precise shape of the
species abundance distribution
High-throughput error leads to an over-estimation of
the difference between groups
Mean bias is ~20-40%
Incorrect OTU clustering is most of that

Steps
1. In silico: Add further complexity to simulations
2. In vitro: Empirically test artificially-created
microbial communities

Why would it?
taxa
Hypothesis: No
sample, conclusions of beta-diversity will be robust to the errors

Air samples in a mycology classroom:
a unique source distorts perceived species richness

Mycology classroom appears to be less rich than other
classrooms…
0 2000 4000 6000 8000
02004006008001000
B
A
C
D
E
Individuals
ChaoEstimatedRichness

… but has higher biomass
A B C D E
050100150200
Classroom
Penicilliumsporeequivalents

Composition of non-mycology classrooms are similar
ABCDE
Proportion
Classroom
0 20 40 60 80 100

Mycology classroom dominated by a few taxa
ABCDE
Proportion
Classroom
0 20 40 60 80 100

xxPuffballs dominate mycology classroom
Pisolithus, aka dog turd fungus Battarrea, tall stiltball
Lycoperdon, common puffball

Mycology classroom dominated by a few taxa
ABCDE
Proportion
Classroom
0 20 40 60 80 100
* * **
Adams et al., in review

Beta diversity of mycology classroom: distinct
communities
-1.5 -1.0 -0.5 0.0 0.5
-0.4-0.20.00.20.40.60.8
NMDS1
NMDS2
Observed

communities
-1.5 -1.0 -0.5 0.0 0.5
-0.4-0.20.00.20.40.60.8
NMDS1
NMDS2
Observed
Taxonomy reassigned

communities
-1.5 -1.0 -0.5 0.0 0.5
-0.4-0.20.00.20.40.60.8
NMDS1
NMDS2
Observed
Taxonomy reassigned
Abundance reassigned

Conclusions
• While deciphering alpha diversity is problematic:
- Inflated alpha due to sequence error & clustering
- Deflated alpha due to unevenness
beta diversity calculations are robust to these errors
in high-throughput sequencing
• Empirical test will be used to corroborate conclusions
of in silico simulations
• High-throughput sequencing will continue to be a
promising tool for microbial ecologists

References – potential biases in high-throughput
sequencing
DNA extraction: Frostegard et al Appl Environ Microbiol 1999; DeSantis et al FEMS
Microbiology 2005; Feinsten et al Appl Environ Microbiol 2009; Morgan et al PLoS ONE
2010; Delmont et al Appl Environ Microbiol 2011
PCR amplification/Relative abundance: Amend et al Mol Ecol 2010; Engelbrektson et al
ISME Journal 2010; Bellemain et al BMC Microbiol 2010; Schloss et al PLoS ONE
2011; Pinto & Raskin PLoS ONE 2012; Klindworth et al Nucleic Acids Res 2013
Sequencing error/Chimeras/OTU clustering: Huse et al Genome Biol 2007; Huse et al
Environ Microbiol 2010; Kunin et al Environ Microbiol 2010; Quince et al BMC
Bioinformatics 2010; Lee et al PLoS ONE 2012; Pinto & Raskin PLoS ONE 2012;
Bachy et al ISME Journal 2013
Sequencing platform/protocol: Morgan et al PLoS ONE 2010; Luo et al PLoS ONE 2012
Even sampling depth: Schloss et al PLoS ONE 2011; Gihring et al Environ Microbiol
2012
Denoising: Gasper & Thomas PLoS ONE 2013;

Empirical test of simulation results
100 600 1100
0.00.20.40.60.8
Number of OTUs
Normalizederror

PCR bias
-0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2
0.00.51.01.52.0
PCR bias: beta distribution a=0.5, beta=1.0
Scatter around line of true abundance versus amplified abundance
Density
0 200 400 600 800 1000 1200
0200400600800100012001400
True abundance
Amplifiedabundance

OTU splitting bias
0 5 10 15 20
0.00.10.20.30.4
Split bias: binomial distribution with n=100
Number of splits
Density
p=0.001
p=0.0667
p=0.0334
p=0.0001
0.0 0.5 1.0
0.00.20.40.60.81.01.2
Split location: beta distribution with a=b=0.5
Location of split
Density

Rachel Adams - SMBE Euks Meeting

Recommandé

Recommandé

Contenu connexe

Similaire à Rachel Adams - SMBE Euks Meeting

Similaire à Rachel Adams - SMBE Euks Meeting (20)

Dernier

Dernier (20)

Rachel Adams - SMBE Euks Meeting