The document discusses several evolutionary forces that have shaped human genetic variation, including migration patterns of early humans, positive selection, background selection, and genetic drift due to population bottlenecks. It provides evidence that a significant fraction of amino acid substitutions in humans were driven by positive selection, especially for genes related to smell and response to pathogens. Background selection has also contributed substantially to the reduction of human genetic diversity due to the removal of weakly deleterious mutations over generations. The strength of background selection correlates with increased genetic differentiation between modern human populations.
7. “Selective Sweep”
• Repeated fixation of functional mutations in coding regions over
evolutionary timescales can lead to a disproportional number of
amino acid substitutions relative to observed polymorphisms.
• This can be summarized by a 2x2 table and analyzed using the
McDonald-Kreitman test:
Non-Syn Syn
Fixed F F
Polymorphic P P1000 Genomes Project Data
Adaptive
Neutral
Nearly Neutral
Mildly Deleterious
Fairly Deleterious
Strongly Deleterious
Putatively neutral diversity levels
The Effect of “Positive Selection”
8. SnIPRE: an improvement to MKT
Since few SNPs and
substitutions are usually
observed per gene, MKT
can be noisy. Pooling
observations across the
genome using a mixed
effects model vastly
increases power.
Eilertson et al, 2012
9. SnIPREASR in 1000 Genomes Project
Human-chimp divergence
Pos Sel Conserved
410 8027
• Conserved genes are either neutral or under
purifying selection.
10. SnIPREASR: an improvement to SnIPRE
• Alignments are generated using MOSAIC, a
program we developed that rigorously integrates
putative orthologs from an arbitrary number of
sources.!
!
• Using PAML, we perform AIC-based model
selection to infer the substitutions along the
human lineage since our divergence with chimp. pythonhosted.org/bio-‐MOSAIC/
Maher & Hernandez (arXiv)
Hum
an
Chim
pO
rangG
orilla
…
Cyrus Maher
11. SnIPREASR works well for positive selection
• Simulations: Human-specific substitutions; Gutenkunst et al.
demographic model.
• 𝛾 is the population scaled selection coefficient.
• SnIPREASR is best-powered to estimate values of 𝛾>0.
Hum
an
Chim
pO
rangG
orilla
…
12. ASR removes genes positively selected in chimp
Human-chimp divergence
Pos Sel Conserved
Human only
(ASR)
Pos Sel 343 0 343
Conserved 67 8027 8094
410 8027
• Conserved genes are either neutral or under
purifying selection.
• 67/410 (16%) of genes identified as positively
selected when comparing human-chimp are
conserved along the human lineage.
14. Amino acid
substitution
Neutral
diversity
levels …
Reflects the fraction
of amino acid
substitutions that are
adaptive
n substitutions
…
Reflects the typical
strength of selection
The footprint of adaptive amino acid substitutions
• Goal: compare the pattern around amino acid substitutions to
the pattern around synonymous substitutions.
Hernandez et al. Science (2011)
15. Observed Patterns of Diversity
Around Human Substitutions
Hernandez et al. Science (2011)
16. Genetic diversity
reduced: π=f0π0
(decrease in effective
population size [Ne])
Adaptive
Neutral
Nearly Neutral
Mildly Deleterious
Fairly Deleterious
Strongly Deleterious
Putatively neutral diversity levels
The Effect of Negative Selection
17. Genetic diversity
reduced: π=f0π0
(decrease in effective
population size [Ne])
Adaptive
Neutral
Nearly Neutral
Mildly Deleterious
Fairly Deleterious
Strongly Deleterious
Putatively neutral diversity levels
The Effect of Negative Selection
18. Putatively neutral diversity levels
The Effect of Negative Selection
Genetic diversity
reduced: π=f0π0
(decrease in effective
population size [Ne])
Adaptive
Neutral
Nearly Neutral
Mildly Deleterious
Fairly Deleterious
Strongly Deleterious
20. BGS correlates with Fst at neutral sites
4 - Population Differentiation as a Function of BGS!
The decrease in Ne locally across the genome as a result of BGS (inferred2 by the value, B, in which lower
values indicate stronger BGS) may impact the rate of genetic drift at specific loci. To investigate this
effect, we measured FST between TGP populations as a function of BGS strength. Our results suggest that
the strength of BGS is a predictor of population differentiation, with an increase in genetic drift driving
this effect.
5 - Forward Simulations of Demography and BGS!
Using a distribution of fitness effects and a demographic model inferred from previous studies3,4, we ran
forward simulations using SFS_CODE5 to estimate the effect of human demography on determining the
reduction in genetic diversity caused by BGS, observing that the effects of BGS are strongest for those
populations that have experienced sharp population bottlenecks (i.e., Europeans and Asians). However, the
expected reduction in diversity due to BGS across all human populations is still greater than for a
simulated population of constant size, illustrating the importance of population expansions for determining
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ●
●
●
●
●
● ●
● ●
●
● ●
●
● ●
● ● ● ●
● ● ●
●
Fst (estimator method) vs. Background Selection
African vs. Asian
0.100.120.140.160.180.200.22
0−24 225−249 475−499 725−749 975−1000
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ● ●
●
●
● ●
●
●
●
● ●
● ●
●
●
●
● ●
●
●
●
●
Fst (estimator method) vs. Background Selection
African vs. European
0.100.120.140.160.180.200.22
0−24 225−249 475−499 725−749 975−1000
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
● ● ● ●
●
●
●
● ● ● ●
●
●
●
● ●
● ●
●
●
●
Fst (estimator method) vs. Background Selection
European vs. Asian
0.100.120.140.160.180.200.22
0−24 225−249 475−499 725−749 975−1000
B value!
FST!
BGS strength!
populationdifferentiation!
FST!
FST!
B value! B value!
African vs. Asian! African vs. European! European vs. Asian!
0.100.120.140.160.180.200.22
0.100.120.140.160.180.200.22
0.100.120.140.160.180.200.22
0-24 225-249 475-499 725-749 975-979 0-24 225-249 475-499 725-749 975-979 0-24 225-249 475-499 725-749 975-979
B value B value
4 - Population Differentiation as a Function of BGS!
The decrease in Ne locally across the genome as a result of BGS (inferred2 by the value, B, in which lower
values indicate stronger BGS) may impact the rate of genetic drift at specific loci. To investigate this
effect, we measured FST between TGP populations as a function of BGS strength. Our results suggest tha
the strength of BGS is a predictor of population differentiation, with an increase in genetic drift driving
this effect.
5 - Forward Simulations of Demography and BGS!
Using a distribution of fitness effects and a demographic model inferred from previous studies3,4, we ran
forward simulations using SFS_CODE5 to estimate the effect of human demography on determining the
reduction in genetic diversity caused by BGS, observing that the effects of BGS are strongest for those
populations that have experienced sharp population bottlenecks (i.e., Europeans and Asians). However, the
expected reduction in diversity due to BGS across all human populations is still greater than for a
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ●
●
●
●
●
● ●
● ●
●
● ●
●
● ●
● ● ● ●
● ● ●
●
Fst (estimator method) vs. Background Selection
African vs. Asian
0.100.120.140.160.180.200.22
0−24 225−249 475−499 725−749 975−1000
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ● ●
●
●
● ●
●
●
●
● ●
● ●
●
●
●
● ●
●
●
●
●
Fst (estimator method) vs. Background Selection
African vs. European
0.100.120.140.160.180.200.22
0−24 225−249 475−499 725−749 975−1000
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
● ● ● ●
●
●
●
● ● ● ●
●
●
●
● ●
● ●
●
●
●
Fst (estimator method) vs. Background Selection
European vs. Asian
0.100.120.140.160.180.200.22
0−24 225−249 475−499 725−749 975−1000
B value!
FST!
BGS strength!
populationdifferentiation!
FST!
FST!
B value! B value!
African vs. Asian! African vs. European! European vs. Asian!
B value
4 - Population Differentiation as a Function of BGS!
The decrease in Ne locally across the genome as a result of BGS (inferred2 by the value, B, in which lower
values indicate stronger BGS) may impact the rate of genetic drift at specific loci. To investigate this
effect, we measured FST between TGP populations as a function of BGS strength. Our results suggest tha
the strength of BGS is a predictor of population differentiation, with an increase in genetic drift driving
this effect.
5 - Forward Simulations of Demography and BGS!
Using a distribution of fitness effects and a demographic model inferred from previous studies3,4, we ran
forward simulations using SFS_CODE5 to estimate the effect of human demography on determining the
reduction in genetic diversity caused by BGS, observing that the effects of BGS are strongest for those
populations that have experienced sharp population bottlenecks (i.e., Europeans and Asians). However, the
expected reduction in diversity due to BGS across all human populations is still greater than for a
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ●
●
●
●
●
● ●
● ●
●
● ●
●
● ●
● ● ● ●
● ● ●
●
Fst (estimator method) vs. Background Selection
African vs. Asian
0.100.120.140.160.180.200.22
0−24 225−249 475−499 725−749 975−1000
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ● ●
●
●
● ●
●
●
●
● ●
● ●
●
●
●
● ●
●
●
●
●
Fst (estimator method) vs. Background Selection
African vs. European
0.100.120.140.160.180.200.22
0−24 225−249 475−499 725−749 975−1000
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
● ● ● ●
●
●
●
● ● ● ●
●
●
●
● ●
● ●
●
●
●
Fst (estimator method) vs. Background Selection
European vs. Asian
0.100.120.140.160.180.200.22
0−24 225−249 475−499 725−749 975−1000
B value!
FST!
BGS strength!
populationdifferentiation!
FST!
FST!
B value! B value!
African vs. Asian! African vs. European! European vs. Asian!
strong weak
• Neutral sites defined as PhyloP ⊂ (-1.2, 1.2)
21. BGS in the human genome
Low Coverage
WGS
High Coverage
exome
of BGS!
, in which lower
investigate this
ults suggest that
etic drift driving
● ● ●
●
●
●
● ● ● ●
●
●
●
● ●
● ●
●
●
●
vs. Background Selection
n vs. Asian
499 725−749 975−1000
alue!
vs. Asian!
ES
LW
YR
MS
GW
IB
CE
TS
GB
FI
CH
JP
CH
KH
CD
BE
PJ
IT
ST
GI
ES
LW
Y
MS
GW
IB
CE
T
GB
FI
CH
JP
CH
KH
CD
BE
PJ
IT
ST
GI
AFR!
!
EUR!
!
EASN!
!
SASN!
!
AFR!
!
EUR!
!
EASN!
!
SASN!
!
4 - BGS Skews the SFS Towards Rare Variants!
Purifying selection on linked sites can cause distortions in gene geneologies, leading to potential skews in
the site-frequency spectrum. To investigate these effects, we measured the SFS as a function of B
separately across the high-coverage and low-coverage regions of phase 3 TGP populations. We observed a
marked increase in the number of of rare variants, especially singletons, in both datasets as a function of
BGS strength. This pattern is amplified in non-African vs. African populations.
Derived Allele Count (log-scale)!
frequency!frequency!
0.00.10.20.30.40.5
YRI
1 2 3 5 10 25 50 150
B: 0−50
B: 476−525
B: 951−1000
0.00.10.20.30.4
CHS
1 2 3 5 10 25 50 150
B: 0−50
B: 476−525
B: 951−1000
CHS!
0.00.10.20.30.40.5
TSI
1 2 3 5 10 25 50 150
B: 0−50
B: 476−525
B: 951−1000
Derived Allele Count (log-scale)!
0.00.10.20.30.40.5
CHS
1 2 3 5 10 25 50 150
B: 0−50
B: 476−525
B: 951−1000
Derived Allele Count (log-scale)!
0.00.10.20.30.40.5
ITU
1 2 3 5 10 25 50 150
B: 0−50
B: 476−525
B: 951−1000
Derived Allele Count (log-scale)!
0.00.10.20.30.4
ITU
1 2 3 5 10 25 50 150
B: 0−50
B: 476−525
B: 951−1000
0.00.10.20.30.4 YRI
1 2 3 5 10 25 50 150
B: 0−50
B: 476−525
B: 951−1000
YRI!
0.00.10.20.30.4
TSI
1 2 3 5 10 25 50 150
B: 0−50
B: 476−525
B: 951−1000
TSI!
Low-!
Coverage!
High-!
Coverage!
ratiovec[1]
1.351.45
●
Low−Coverage
High−Coverage
Ratio of Singleton Frequency in Strong BGS Bin vs. Weak BGS Bin!
ITU!
ratio!
• Neutral sites defined as PhyloP ⊂ (-1.2, 1.2)
23. Complex signatures of selection
• Soft selective sweeps result in multiple
haplotypes increasing in frequency.
Soft Sweep
Zach Szpiech
24. Extended Multiple Haplotype Homozygosity
-- haplotype sample size!
-- set of distinct haplotypes from the locus to marker x!
-- ith most frequent haplotype!
-- number of haplotypes
EHH
SelScan: Szpiech & Hernandez (arXiv)
Sorry, redacted for now… More
coming soon!!
25. Power
0 0.01 0.02 0.05 0.10
160%
120%
80%
40%
0%
Constant Demography (s = 0.01)
0.70
0.80
0.90
Frequency at which selection begins
%increaseinpoweroveriHS
Sampling
Frequency
0 0.01 0.02 0.05 0.10
140%
120%
60%
100%
80%
40%
20%
0%
African Demography (s = 0.01)
0.70
0.80
0.90
Frequency at which selection begins
%increaseinpoweroveriHS
Sampling
Frequency
0 0.01 0.02 0.05 0.10
60%
100%
80%
40%
20%
0%
European Demography (s = 0.01)
0.70
0.80
0.90
Frequency at which selection begins
%increaseinpoweroveriHS
Sampling
Frequency
0 0.01 0.02 0.05 0.10
100%
60%
80%
40%
20%
0%
Constant Demography (s = 0.01)
0.70
0.80
0.90
Frequency at which selection begins
Power
Sampling
Frequency
0 0.01 0.02 0.05 0.10
100%
60%
80%
40%
20%
0%
African Demography (s = 0.01)
0.70
0.80
0.90
Frequency at which selection begins
Power
Sampling
Frequency
0 0.01 0.02 0.05 0.10
100%
60%
80%
40%
20%
0%
European Demography (s = 0.01)
0.70
0.80
0.90
Frequency at which selection begins
Power
Sampling
Frequency
26. A genomic approach to
detecting selection
• Most SNPs are non-coding.
• Most regulatory elements do not act on the
nearest gene.
• We can use genome-wide signatures of selection
to infer selection on genes using eQTL
information.
ARTICLE
Sherlock: Detecting Gene-Disease Associations
by Matching Patterns of Expression QTL and GWAS
Xin He,1,2 Chris K. Fuller,1 Yi Song,1 Qingying Meng,3 Bin Zhang,4 Xia Yang,3 and Hao Li1,*
Genetic mapping of complex diseases to date depends on variations inside or close to the genes that perturb their activities. A strong
body of evidence suggests that changes in gene expression play a key role in complex diseases and that numerous loci perturb gene
expression in trans. The information in trans variants, however, has largely been ignored in the current analysis paradigm. Here we pre-
sent a statistical framework for genetic mapping by utilizing collective information in both cis and trans variants. We reason that for a
disease-associated gene, any genetic variation that perturbs its expression is also likely to influence the disease risk. Thus, the expression
quantitative trait loci (eQTL) of the gene, which constitute a unique ‘‘genetic signature,’’ should overlap significantly with the set of loci
associated with the disease. We translate this idea into a computational algorithm (named Sherlock) to search for gene-disease associa-
tions from GWASs, taking advantage of independent eQTL data. Application of this strategy to Crohn disease and type 2 diabetes pre-
dicts a number of genes with possible disease roles, including several predictions supported by solid experimental evidence. Importantly,
predicted genes are often implicated by multiple trans eQTL with moderate associations. These genes are far from any GWAS association
signals and thus cannot be identified from the GWAS alone. Our approach allows analysis of association data from a new perspective and
is applicable to any complex phenotype. It is readily generalizable to molecular traits other than gene expression, such as metabolites,
noncoding RNAs, and epigenetic modifications.
Introduction
Recent application of genome-wide association studies
(GWASs) to complex human diseases led to the discovery
both cis- and trans-expression QTL in the context of associ-
ation studies. So far, information from trans variations has
largely been ignored because only cis variants can be as-
signed to their target genes based on proximity by using
the GWAS data alone. The growing collection of eQTLHe et al. AJHG (2013)
27. Detecting selection on
regulatory networks
Figure 1. The Sherlock Algorithm: Matching Genetic Signatures of Gene Expression Traits to that of the Disease to Identify Gene-
Disease Associations He et al. AJHG (2013)
29. Selection on standing variation
driven by response to pathogens
Description P-value FDR q-value
cytokine-mediated signaling
pathway
5.92E-06 6.26E-02
immune effector process 7.47E-06 3.95E-02
regulation of immune system
process
7.47E-06 2.64E-02
regulation of defense response
to virus
8.53E-06 2.26E-02
lymphocyte costimulation 9.36E-06 1.98E-02
T cell costimulation 9.36E-06 1.65E-02
GOrilla
30. Haplotype-based selection signals
recapitulate geography
−5 0 5
−50510
Top 1% of windows
PC1 (14.4%)
PC2(12.6%)
ACB
ASW
CDX
CEU
CHBCHS
CLM
FIN GBRGIH IBS
JPT
KHV
LWK
MKK
MXL
PEL PUR
TSI
YRI• TGP samples with
phased OMNI
genotype data
• Used iHS
• 100kb windows for
each population are
coded 1 if selection
score is in top 1%
(0 otherwise)
31. Conclusions
• Many complex signatures of selection in the human
genome.
• Mixtures of positive and negative selection
• Complicated modes of selection (including soft sweeps)
• Predominant signature of ancient human-lineage
selection seems to be from olfactory processes
• Recent selection on standing variation associated with
complex traits, including pathogen response.
32. Thanks!
1000 Genomes Project Consortium
Funding: NHGRI; QB3; CHARM; CTSI
ryan.hernandez@ucsf.edu
Nicolas
Strauli
Cyrus
Maher
Raul
Torres
Lawrence
Uricchio
Zach
Szpiech