SlideShare une entreprise Scribd logo
1  sur  132
Télécharger pour lire hors ligne
Bioinformatics for discovery:
Introduction to GWAS and EWAS
BMI 701:Introduction to Biomedical Informatics

12/1/2015
chirag@hms.harvard.edu

@chiragjp

www.chiragjpgroup.org
Chirag J Patel
P = G + EType 2 Diabetes

Cancer

Alzheimer’s

Gene expression
Phenotype Genome
Variants
Environment
Infectious agents

Nutrients

Pollutants

Drugs
Complex traits are a function of genes and
environment...
We are great at G investigation!
over 2000 

Genome-wide Association Studies (GWAS)

https://www.ebi.ac.uk/gwas/
G
>2,000 traits/diseases

>15,000 SNPs

>16,000 SNP-trait associations
https://www.ebi.ac.uk/gwas/
Dissecting G in P:
What is a Genome-wide Association Study?
Hypothesis-free “search engine” for genetic variants 

associated with a complex trait or disease 

in unrelated populations
SNP(A) SNP(a)
diseased
non-
diseased
SNP(A) SNP(a)
diseased
non-
diseased
SNP(A) SNP(a)
diseased
non-
diseased
SNP(A) SNP(a)
diseased
non-
diseased
SNP(A) SNP(a)
diseased
non-
diseased
SNP(A) SNP(a)
diseased
non-
diseased
SNP(A) SNP(a)
diseased
non-
diseased
SNP(A) SNP(a)
diseased
non-
diseased
SNP(A) SNP(a)
diseased
non-
diseased
SNP(Z) SNP(z)
diseased
non-
diseasedgenome-wide
The road to GWAS...
A new paradigm of GWAS for discovery of G in P:
Human Genome Project to GWAS
Sequencing of the genome
2001
HapMap project:
http://hapmap.ncbi.nlm.nih.gov/
Characterize common variation
2001-current day
High-throughput variant
assay
< $99 for ~1M variants
Measurement tools
~2003 (ongoing)
ARTICLES
Genome-wide association study of 14,000
cases of seven common diseases and
3,000 shared controls
The Wellcome Trust Case Control Consortium*
There is increasing evidence that genome-wide association (GWA) studies represent a powerful approach to the
identification of genes involved in common human diseases. We describe a joint GWA study (using the Affymetrix GeneChip
500K Mapping Array Set) undertaken in the British population, which has examined ,2,000 individuals for each of 7 major
diseases and a shared set of ,3,000 controls. Case-control comparisons identified 24 independent association signals at
P , 5 3 1027
: 1 in bipolar disorder, 1 in coronary artery disease, 9 in Crohn’s disease, 3 in rheumatoid arthritis, 7 in type 1
diabetes and 3 in type 2 diabetes. On the basis of prior findings and replication studies thus-far completed, almost all of these
signals reflect genuine susceptibility effects. We observed association at many previously identified loci, and found
compelling evidence that some loci confer risk for more than one of the diseases studied. Across all diseases, we identified a
25 27
Vol 447|7 June 2007|doi:10.1038/nature05911
Nature 2008
Comprehensive, high-throughput analyses
GWAS
Number of raw publications with subject of
“GWAS”
0
1000
2000
3000
1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
Year
NumberofPublications'GWAS'
pubmed MeSH terms:
human + GWAS
Number of raw publications with subject of
“GWAS”
0
1000
2000
3000
1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
Year
NumberofPublications'GWAS'
pubmed MeSH terms:

human + GWAS
Risch + Merikangas
linkage vs. association
human genome sequenced
GWAS
age-related macular degeneration
mega-meta-GWAS
WTCCC
GWAS is relevant today (even with NGS) around the corner
Why execute GWAS?
Geneticists have made substantial progress in
identifying the genetic basis of many human
diseases, at least those with conspicuous deter-
minants.ThesesuccessesincludeHuntington's
disease, Alzheimer's disease, and some forms of
breast cancer. However, the detection of ge-
netic factors for complex diseases-such as
schizophrenia, bipolardisorder, anddiabetes-
has been far more complicated. There have
been numerous reports of genes or loci that
might underlie these disorders, butfew ofthese
findings have been replicated. The modest na-
ture ofthe gene effectsforthese disorders likely
explains the contradictory and inconclusive
claims about their identification. Despite the
small effects of such genes, the magnitude of
theirattributable risk (theproportion ofpeople
affectedduetothem) maybelargebecause they
are quite frequent in the population, making
them ofpublic health significance.
Has the genetic study ofcomplex disorders
reached its limits? The persistent lack of
replicability of these reports of linkage be-
tween various loci and complex diseases
might imply that it has. We argue below that
age analysis we have chosen for this argu-
ment is a popular current paradigm in which
pairs of siblings, both with the disease, are
examined for sharing of alleles at multiple
sites in the genome defined by genetic mark-
ers. The more often the affected siblings
share the same allele at a particular site, the
more likely the site is close to the disease
gene. Using the formulas in (1), we calculate
the expected proportion Yofalleles shared by
a pair ofaffected siblings for the best possible
case-that is, a closely linked marker locus
(recombination fraction 0 = 0) that is fully
informative (heterozygosity = 1) (2)-as
1 +W wherew= pq(y-1)2
2+w (py+q)2
If there is no linkage of a marker at a
particular site to the disease, the siblings
would be expected to share alleles 50% ofthe
time; that is, Y would equal 0.5. Values of Y
for various values ofp and y are given in the
third column of the table. For an allele of
moderate frequency (p is 0.1 to 0.5) that con-
linkage analysis for
about 2 or less will ne
because the numbe
(more than -2500)
able.
Although testsof
est effect are of low
above example, direc
a disease locus itself
To illustrate this poi
sion/disequilibrium t
In this test, transmis
at a locus from heter
affected offspring is e
lian inheritance, all a
chance ofbeing tran
eration. In contrast,
associated with dise
mitted more often th
For this approach,
with multiple affect
just on single affect
parents. For the same
can calculate the pr
parents as pq(y + 1
the probability for a
transmit the high ris
Association tests ca
pairs of affected sibl
associatedwithdiseas
over 50% is the same
the probability ofpar
creased at lowvalues
the probability ofpar
creased. The formula
The Future of Genetic Studies of
Complex Human Diseases
Neil Risch and Kathleen Merikangas
onimm, 0In"a0,"a,
Geneticists have made substantial progress in
identifying the genetic basis of many human
diseases, at least those with conspicuous deter-
minants.ThesesuccessesincludeHuntington's
disease, Alzheimer's disease, and some forms of
breast cancer. However, the detection of ge-
netic factors for complex diseases-such as
schizophrenia, bipolardisorder, anddiabetes-
has been far more complicated. There have
been numerous reports of genes or loci that
might underlie these disorders, butfew ofthese
findings have been replicated. The modest na-
ture ofthe gene effectsforthese disorders likely
explains the contradictory and inconclusive
claims about their identification. Despite the
small effects of such genes, the magnitude of
theirattributable risk (theproportion ofpeople
affectedduetothem) maybelargebecause they
are quite frequent in the population, making
them ofpublic health significance.
Has the genetic study ofcomplex disorders
reached its limits? The persistent lack of
replicability of these reports of linkage be-
tween various loci and complex diseases
might imply that it has. We argue below that
age analysis we have chosen for this ar
ment is a popular current paradigm in whi
pairs of siblings, both with the disease,
examined for sharing of alleles at multip
sites in the genome defined by genetic mar
ers. The more often the affected sibli
share the same allele at a particular site, t
more likely the site is close to the dise
gene. Using the formulas in (1), we calcul
the expected proportion Yofalleles shared
a pair ofaffected siblings for the best possi
case-that is, a closely linked marker lo
(recombination fraction 0 = 0) that is fu
informative (heterozygosity = 1) (2)-as
1 +W wherew= pq(y-1)2
2+w (py+q)2
If there is no linkage of a marker at
particular site to the disease, the sibli
would be expected to share alleles 50% oft
time; that is, Y would equal 0.5. Values o
for various values ofp and y are given in t
third column of the table. For an allele
moderate frequency (p is 0.1 to 0.5) that co
The Future of Genetic Studies of
Complex Human Diseases
Neil Risch and Kathleen Merikangas
Science, 1996
A new paradigm is needed for discovery!
How does a GWAS work?
Single nucleotide polymorphisms (SNPs):
How many SNPs are in the human genome?
>3,000,000,000 bases in human genome
SNPs appear ~1000 bases
~3,000,000 SNPs
40-60% have minor allele frequency <5%

GWAS focus on frequency >5%
HapMap Consortium, 2010
Can’t measure everything:
Tag SNPs and Linkage Disequilibrium (LD)
LD = co-occurance of SNPs in a contiguous region
Bush and Moore, 2012
The phenomenon of LD makes GWAS possible:
How and why?: Indirect association
additional studies to map the precise
location of the influential SNP.
Conceptually, the end result of GWAS
under the common disease/common var-
needed to capture the variation
African genome.
It is important to note that t
ogy for measuring genomic
Figure 3. Indirect Association. Genotyped SNPs often lie in a region of high linka
will be statistically associated with disease as a surrogate for the disease SNP throu
doi:10.1371/journal.pcbi.1002822.g003
Bush and Moore, 2012
LD blocks
Can’t measure everything:
Tag SNPs and Linkage Disequilibrium
Tag SNPs are common proxies for other SNPs

500K - 1M per chip
tified significant associations for seven SNPs representing four new
T2DM loci (Table 1). In all cases, the strongest association for the
MAX statistic (see Methods) was obtained with the additive model.
of this gene (Fig. 2a)
solely in the secretory
final stages of insulin
*
*
*
0
2
4
–log10[P]
–log10[P]
*
4954642sr
2373971sr
3373971sr
445409sr
8012261sr
3349941sr
883429sr
2019462sr
0349941sr
90350501sr
036169sr
0415007sr
2225991sr
6136642sr
8136642sr
1869646sr
8798751sr
04928201sr
3926642sr
5926642sr
43666231sr
9926642sr
2954642sr
01350501sr
5769646sr
4577187sr
4769646sr
41350501sr
5784931sr
2173387sr
39250501sr
5050007sr
7492602sr
1255051sr
156868sr
4373387sr
4784931sr
7501107sr
2697402sr
91518711sr
6461001sr
29250501sr
5889103sr
8669646sr
0889103sr
4688392sr
SLC30A8 IDE
0
2
4
7912381sr
3148707sr
0283856sr
52078111sr
5227373sr
0491242sr
2369412sr
2297881sr
662155sr
7790197sr
44068701sr
35075221sr
5826807sr
7851092sr
9409522sr
–log10[P]
–log10[P]
EXT2 ALX4
0
2
4
*** *
0
2
4
a b
c d
LD block
2 alleles are correlated because they are inherited
together
Sladek et al, 2007
image: www.lifa-core.de/
Digitizing SNPs:
e.g., Illumina Infinium Array
image: illumina.com
Assessing Thousands of Factors Simultaneously:
Data-driven search for differences in SNP frequencies
~100,000 - ~1,000,000 association tests
disease cases
healthy controls
GCAGGTACATG...GGTA...
GCAGGTACACG...GGTA...
GCAGGTACATG...GGTA...
GCAGGTACACG...GGTA...
GCAGGTACATG...GGTA...
GCAGGTACACG...GGTA...
disease cases
GCAGGTACATG...GGTA...
GCAGGTACATG...GGTA...
GCAGGTACATG...GGTA...
GCAGGTACATG...GGTA...
healthy controls
Associating One SNP with Disease
Case-Control Study Design
DiseaseSNP (A/a)
?
A a
diseased
non-
diseased
cases
controls
Associating One SNP with Disease
What is an “Odds Ratio”?
DiseaseSNP (A/a)
?
A a
diseased c d
non-
diseased
x y
cases
controls
Chi-squared test
Odds Ratio a vs A:
Odds of disease with allele a
vs.
Odds of disease with allele A
1: equal odds (no difference)

>1: increased odds (increased risk)

<1: decreased odds (decreased risk)
Associating One SNP with Disease
Calculating the Odds Ratio
DiseaseSNP (A/a)
?
A a
diseased c d
non-
diseased
x y
cases
controls
Chi-squared test

Odds Ratio
dx
cy
y/x
d/c
[d/(d+y)]/[y/(d+y)]
Odds Ratio a vs A:
[c/(x+y)]/[x/(c+x)]
Odds with allele a
Odds with allele A
How would you interpret an OR of 2?
Associating One SNP with Disease
Cohort Study Design
DiseaseSNP (A/a)
?
•Direct measure of risk vs. odds ratio

•Need to wait!
•If incidence is low, N needs to be large!
Non-diseasedSNP (A/a)
vs.
Cox survival regression

Relative Risk
Models to associate genotypes with disease
Examples for a case-control study
Aa AA
AA
aa Aa
AaaaAa
Disease Non-diseased
ND=4 NC=4
Models to associate genotypes with disease
Examples for a case-control study
Aa AA
AA
aa Aa
AaaaAa
Disease Non-diseased
ND=4 NC=4
A a
diseased
non-
diseased
6 2
2 6
OR A (vs a)

OR a (vs A)
AA Aa aa
diseased
non-
diseased
Models to associate genotypes with disease
Genotypic Test (“2 or 1 df test”)
Aa AA
AA
aa Aa
AaaaAa
Diseased Non-diseased
ND=4 NC=4
2 OR AA (vs. Aa)

aa (vs. Aa)
2 0
220
Associating One SNP with Quantitative Trait
(e.g., height, weight, cholesterol)
40
60
80
100
1 2 3
factor(SNP)
trait
GG GC CC
height
SNP rs1234 SNP rs123456
25
50
75
100
125
1 2 3
factor(SNP)
trait
height
CC CT TT
Associating One SNP with Quantitative Trait
Linear Regression and Additive Risk Model
y=ɑ+βx+ε
25
50
75
100
125
1 2 3
factor(SNP)
trait
height
CC (0) CT (1) TT (2)
SNP rs123456
height = ɑ+βx
xCC=0 if individual is CC
xCT=1 if individual is CT
xTT=2 if individual is TT
ɑ
β: change in height for 1 risk allele
T= risk allele
β
Prototypical “Manhattan plot” to visualize
associations
Science, 2007
~100,000 - ~1,000,000 association tests
evol
part
ease
tase
well
biol
T
capt
imp
STR
reve
subs
libri
clea
−log10(P)
0
5
10
15
Chromosome
22
X
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
80
60
40
100
rvedteststatistic
a
b
NATURE|Vol 447|7 June 2007
AA Aa aa
diseased
non-
diseased
ibility with schizophrenia, a psychotic disorder with many similar-
ities to BD. In particular association findings have been reported with
assium channel. Ion channelopathies are well-recognized as causes of
episodic central nervous system disease, including seizures, ataxias
−log10
(P)
0
5
10
15
0
5
10
15
0
5
10
15
0
5
10
15
0
5
10
15
0
5
10
15
0
5
10
15
Chromosome
Type 2 diabetes
22
XX
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
22
XX
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
22
XX
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
22
XX
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
22
XX
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
22
XX
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
22
XX
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
Coronary artery disease
Crohn’s disease
Hypertension
Rheumatoid arthritis
Type 1 diabetes
Bipolar disorder
Figure 4 | Genome-wide scan for seven diseases. For each of seven diseases
2log10 of the trend test P value for quality-control-positive SNPs, excluding
Chromosomes are shown in alternating colours for clarity, with
P values ,1 3 1025
highlighted in green. All panels are truncated at
Type I Error:
False Positives!
what is a p-value?
chance we attain the observed result if no difference (H0)
Many tests: some can be significant (low p-value by chance)!
100 tests at a p-value of 0.05...
how many would be significant per chance?
Bonferroni “correction”:

Correct the 0.05 significance level by number of tests
e.g., 1M SNPs: 0.05/1x10-6 = 5x10-8
QQplot:
Distribution of of observed p-values vs. Ho p-
values
Histogram of runif(10000)
runif(10000)
Frequency
0.0 0.2 0.4 0.6 0.8 1.0
0100200300400500
p-values under Ho
Histogram of gwas$P.value
gwas$P.value
Frequency
0.0 0.2 0.4 0.6 0.8 1.0
050000100000150000
p-values of GWAS in Total Cholesterol
Global Lipids Consortium, 2012random uniform distribution
QQplot:
Distribution of of observed p-values vs. Ho p-
values
Histogram of gwas$P.value
gwas$P.value
Frequency
0.0 0.2 0.4 0.6 0.8 1.0
050000100000150000
p-values of GWAS in Total Cholesterol
Which diseases show evidence of association?
Examining the QQplot of test statistics in WTCCC
sent study cannot provideconclusive exclusion of any given gene. This
is the consequence of several factors including: less-than-complete
coverage of common variation genome-wide on the Affymetrix chip;
poor coverage (by design) of rare variants, including many structural
variants (thereby reducing power to detect rare, penetrant, alleles)25
;
difficultieswithdefining thefullgenomicextentofthegene ofinterest;
and, despite the sample size, relatively low power to detect, at levels of
already allow us, for selected diseases, to highlight pathways and
mechanisms of particular interest. Naturally, extensive resequencing
and fine-mapping work, followed by functional studies will be
required before such inferences can be translated into robust state-
ments about the molecular and physiological mechanisms involved.
We turn now to a discussion of the main findings for each disease,
focusing here only on the most significant and interesting results
25
20
20
15
15
10
10
5
5
30
0
0
25
20
20
15
15
10
10
5
5
30
0
0
25
20
20
15
15
10
10
5
5
30
0
0
25
20
20
15
15
10
10
5
5
30
0
0
25
20
20
15
15
10
10
5
5
30
0
0
25
20
20
15
15
10
10
5
5
30
0
0
25
20
20
15
15
10
10
5
5
30
0
0
BD
Observedteststatistic
Expected chi-squared value
CAD CD
HT RA
T2D
T1D
Figure 3 | Quantile-quantile plots for seven genome-wide scans. For each
of the seven disease collections, a quantile-quantile plot of the results of the
trend test is shown in black for all SNPs that pass the standard project filters,
have a minor allele frequency .1% and missing data rate ,1%. SNPs that
360,000 SNPs. SNPs at which the test statistic exceeds 30 are represented by
triangles. Additional quantile-quantile plots, which also exclude all SNPs
located in the regions of association listed in Table 3, are superimposed in
blue (for BD, the exclusion of these SNPs has no visible effect on the plot, and
Observational associations do not equal causation...
Ice Cream $ Drowning
Confounding bias
What is a confounder?
Summer!
?
Confounder is correlated to both the “risk” factor and disease,

leading to invalid inference.

Common source of bias in observational studies (e.g., case-control,
cohort, etc)
SNP Disease
Population Stratification:
A source of possible confounding in GWAS
race/ethnicity
?
Ancestry correlated with allele frequency and disease

GWAS are done on specific populations separately.

(most have been done in populations of European ancestry)
FTO Diabetes
Mediation
SNPs indicative of a mediator factor?
Example: FTO and Type 2 Diabetes
Body Mass
?
Association between FTO and Type 2 Diabetes via BMI?
... or does FTO have a independent role in Type 2 Diabetes...?
FTO Body Mass
PLINK:
(Standard) Whole Genome Analysis Software
PLINK:
(Standard) Whole Genome Analysis Software
http://pngu.mgh.harvard.edu/~purcell/plink/
•cited >9000 times since 2007

•allele frequency

•linkage disequilibrium (LD)

•data manipulation/filtering

•association: allelic, genotypic models

•chi-square

•logistic

•linear
Examples: 

GWASs in Type 2 Diabetes
Type 2 Diabetes Mellitus:
A complex, multifactorial disease
•Insulin production vs. use

•beta-cell function

•insulin sensitivity (BMI)

•Moves glucose from blood into
cells

•Complications arise due to
glucose in blood, hyperglycemia
•diagnosed by blood glucose
levels

CDC,
family history: 25%
body weight, diet, lifestyle, age
ARTICLES
A genome-wide association study
identifies novel risk loci for type 2 diabetes
Robert Sladek1,2,4
, Ghislain Rocheleau1
*, Johan Rung4
*, Christian Dina5
*, Lishuang Shen1
, David Serre1
,
Philippe Boutin5
, Daniel Vincent4
, Alexandre Belisle4
, Samy Hadjadj6
, Beverley Balkau7
, Barbara Heude7
,
Guillaume Charpentier8
, Thomas J. Hudson4,9
, Alexandre Montpetit4
, Alexey V. Pshezhetsky10
, Marc Prentki10,11
,
Barry I. Posner2,12
, David J. Balding13
, David Meyre5
, Constantin Polychronakos1,3
& Philippe Froguel5,14
Type 2 diabetes mellitus results from the interaction of environmental factors with a combination of genetic variants, most of
which were hitherto unknown. A systematic search for these variants was recently made possible by the development of
high-density arrays that permit the genotyping of hundreds of thousands of polymorphisms. We tested 392,935
single-nucleotide polymorphisms in a French case–control cohort. Markers with the most significant difference in genotype
frequencies between cases of type 2 diabetes and controls were fast-tracked for testing in a second cohort. This identified
four loci containing variants that confer type 2 diabetes risk, in addition to confirming the known association with the TCF7L2
gene. These loci include a non-synonymous polymorphism in the zinc transporter SLC30A8, which is expressed exclusively in
insulin-producing b-cells, and two linkage disequilibrium blocks that contain genes potentially involved in b-cell
development or function (IDE–KIF11–HHEX and EXT2–ALX4). These associations explain a substantial portion of disease risk
and constitute proof of principle for the genome-wide approach to the elucidation of complex genetic traits.
The rapidly increasing prevalence of type 2 diabetes mellitus (T2DM) is
thought to be due to environmental factors, such as increased availabil-
ity of food and decreased opportunity and motivation for physical
activity, acting on genetically susceptible individuals. The heritability
of T2DM is one of the best established among common diseases and,
consequently, genetic risk factors for T2DM have been the subject of
intense research1
. Although the genetic causes of many monogenic
forms of diabetes (maturity onset diabetes in the young, neonatal mito-
chondrial and other syndromic types of diabetes mellitus) have been
elucidated, few variants leading to common T2DM have been clearly
identified and individually confer only a small risk (odds ratio < 1.1–
1.25) of developing T2DM1
. Linkage studies have reported many
T2DM-linked chromosomal regions and have identified putative, cau-
sative genetic variants in CAPN10 (ref. 2), ENPP1 (ref. 3), HNF4A (refs
4, 5) and ACDC (also called ADIPOQ)6
. In parallel, candidate-gene
studieshavereportedmanyT2DM-associatedloci,withcodingvariants
in the nuclear receptor PPARG (P12A)7
and the potassium channel
KCNJ11 (E23K)8
being among the very few that havebeen convincingly
replicated. The strongest known (odds ratio < 1.7) T2DM association9
was recently mapped to the transcription factor TCF7L2 and has been
consistently replicated in multiple populations10–20
.
Subjects and study design
The recent availability of high-density genotyping arrays, which com-
bine the power of association studies with the systematic nature of a
genome-wide search, led us to undertake a two-stage, genome-wide
association study to identify additional T2DM susceptibility loci
(Supplementary Fig. 1). In the first stage of this study, we obtained
genotypes for 392,935 single-nucleotide polymorphisms (SNPs) in
1,363 T2DM cases and controls (Supplementary Table 1). In order to
enrich for risk alleles21
, the diabetic subjects studied in stage 1 were
selected to have at least one affected first degree relative and age at
onset under 45 yr (excluding patients with maturity onset diabetes in
the young). Furthermore, in order to decrease phenotypic hetero-
geneity and to enrich for variants determining insulin resistance and
b-cell dysfunction through mechanisms other than severe obesity, we
initially studied diabetic patients with a body mass index (BMI)
,30 kg m22
. Control subjects were selected to have fasting blood
glucose ,5.7 mmol l21
in DESIR, a large prospective cohort for the
study of insulin resistance in French subjects22
.
Genotypes for each study subject were obtained using two plat-
forms: Illumina Infinium Human1 BeadArrays, which assay 109,365
SNPs chosen using a gene-centred design; and Human Hap300
BeadArrays, which assay 317,503 SNPs chosen to tag haplotype
blocks identified by the Phase I HapMap23
. Of the 409,927 markers
that passed quality control (Supplementary Tables 2 and 3), geno-
types were obtained for an average of 99.2% (Human1) and 99.4%
(Hap300) of markers for each subject with a reproducibility of
.99.9% (both platforms). Forty-three subjects were removed from
analysis because of evidence of intercontinental admixture (Sup-
plementary Fig. 3) and an additional four because their genotype-
determined gender disagreed with clinical records. In total, T2DM
association was tested for 100,764 (Human1) and 309,163 (Hap300)
SNPs representing 392,935 unique loci (Fig. 1). Because of unequal
male/female ratios in our cases and controls, we analysed the 12,666
sex-chromosome SNPs separately for each gender.
*These authors contributed equally to this work.
1
Departments of Human Genetics, 2
Medicine and 3
Pediatrics, Faculty of Medicine, McGill University, Montreal H3H 1P3, Canada. 4
McGill University and Genome Quebec Innovation
Centre, Montreal H3A 1A4, Canada. 5
CNRS 8090-Institute of Biology, Pasteur Institute, Lille 59019 Cedex, France. 6
Endocrinology and Diabetology, University Hospital, Poitiers
86021 Cedex, France. 7
INSERM U780-IFR69, Villejuif 94807, France. 8
Endocrinology-Diabetology Unit, Corbeil-Essonnes Hospital, Corbeil-Essonnes 91100, France. 9
Ontario
Institute for Cancer Research, Toronto M5G 1L7, Canada. 10
Montreal Diabetes Research Center, Montreal H2L 4M1, Canada. 11
Molecular Nutrition Unit and the Department of
Nutrition, University of Montreal and the Centre Hospitalier de l’Universite´ de Montre´al, Montreal H3C 3J7, Canada. 12
Polypeptide Hormone Laboratory and Department of Anatomy
and Cell Biology, Montreal H3A 2B2, Canada. 13
Department of Epidemiology & Public Health, Imperial College, St Mary’s Campus, Norfolk Place, London W2 1PG, UK. 14
Section of
Genomic Medicine, Imperial College London W12 0NN, and Hammersmith Hospital, Du Cane Road, London W12 0HS, UK.
881
Nature©2007 Publishing Group
Nature, 2/2007
References and Notes
1. B. G. Richmond, D. S. Strait, Nature 404, 382 (2000).
2. J. Kingdon, Lowly Origins (Princeton Univ. Press,
Princeton, NJ, 2003).
3. C. V. Ward, M. G. Leakey, A. Walker, Evol. Anthropol. 7,
197 (1999).
4. Y. Haile-Selassie, Nature 412, 178 (2001).
5. T. D. White et al., Nature 440, 883 (2006).
6. K. Kovarovic, P. Andrews, J. Hum. Evol., in press (available
at http://dx.doi.org./doi:10.1016/j.jhevol.2007.01.001; doi:
10.1016/j.jhevol.2007.01.001).
7. N. Patterson, D. J. Richter, S. Gnerre, E. S. Lander,
D. Reich, Nature 441, 1103 (2006).
8. K. D. Hunt et al., Primates 37, 363 (1996).
9. J. G. Fleagle et al., Symp. Zool. Soc. London 48, 359
(1981).
10. R. H. Crompton et al., Cour. Forsch-Inst. Senckenb. 243,
115 (2003).
11. J. T. Stern, Yrb. Phys. Anthropol. 19, 59 (1975).
12. S. K. S. Thorpe, R. H. Crompton, Am. J. Phys. Anthropol.
131, 384 (2006).
13. K. D. Hunt, J. Hum. Evol. 26, 183 (1994).
15. E. Larney, S. Larsen, Am. J. Phys. Anthropol. 125, 42 (2004).
16. S. K. S. Thorpe, R. H. Crompton, Am. J. Phys. Anthropol.
127, 58 (2005).
17. S. K. S. Thorpe, R. H. Crompton, M. M. Gunther,
R. F. Ker, R. McN. Alexander, Am. J. Phys. Anthropol.
110, 179 (1999).
18. R. McN. Alexander, Principles of Animal Locomotion
(Princeton Univ. Press, Princeton, NJ, 2003).
19. C. V. Ward, Yrbk. Phys. Anthropol. 45, 185 (2002).
20. R. W. Wrangham, N. L. Conklin-Brittain, K. D. Hunt,
Int. J. Primatol. 19, 949 (1998).
21. H. Pontzer, R. W. Wrangham, J. Hum. Evol. 46, 317 (2004).
22. R. C. Payne et al., J. Anat. 208, 709 (2006).
23. M. Pickford, B. Senut, B. Gommery, in Late Cenozoic
Environments and Hominid Evolution: a Tribute to Bill
Bishop, P. Andrews, P. Banham, Eds. (Geological Society,
London, 1999), pp. 27–38.
24. N. M. Young, L. MacLatchy, J. Hum. Evol. 46, 163 (2004).
25. D. Gommery, B. Senu, M. Pickford, E. Musiime,
Ann. Paléontol. 88, 167 (2002).
26. C. V. Ward, in Handbook of Paleoanthropology Vol. 2:
Primate Evolution and Human Origins, W. Henke,
I. Tattersall, Eds. (Springer, Heidelberg, Germany, 2007),
pp. 1011–1030.
N. Ogihara, M. Nakatsukasa, Eds. (Springer, Heidelberg,
Germany, 2006), pp. 199–208.
28. C. P. E. Zollikofer et al., Nature 434, 755 (2005).
29. M. Pickford, Anthropologie 69, 191 (2005).
30. We thank the Indonesian Institute of Science, Indonesian
Nature Conservation Service, and Leuser Development
Programme for granting permission and giving support
for research in the Leuser Ecosystem. R. McN. Alexander,
T. M. Blackburn, S. Burtles. J. Rees, N. Jeffery,
E. E. Vereecke, A. Walker, A. Wilson, and B. Wood
commented on the manuscript. R. Savage developed the
animation (fig. S1). Studies of captive animals were
hosted by the North of England Zoological Society. This
research was supported by grants from the Leverhulme
Trust, the Royal Society, the L.S.B. Leakey Foundation,
and the Natural Environment Research Council.
Supporting Online Material
www.sciencemag.org/cgi/content/full/316/5829/1328/DC1
Table S1
Movies S1 to S3
5 February 2007; accepted 18 April 2007
10.1126/science.1140799
Genome-Wide Association Analysis
Identifies Loci for Type 2 Diabetes
and Triglyceride Levels
Diabetes Genetics Initiative of Broad Institute of Harvard and MIT, Lund University,
and Novartis Institutes for BioMedical Research*†
New strategies for prevention and treatment of type 2 diabetes (T2D) require improved insight into
disease etiology. We analyzed 386,731 common single-nucleotide polymorphisms (SNPs) in 1464
patients with T2D and 1467 matched controls, each characterized for measures of glucose
metabolism, lipids, obesity, and blood pressure. With collaborators (FUSION and WTCCC/UKT2D),
we identified and confirmed three loci associated with T2D—in a noncoding region near CDKN2A
and CDKN2B, in an intron of IGF2BP2, and an intron of CDKAL1—and replicated associations near
HHEX and in SLC30A8 found by a recent whole-genome association study. We identified and
confirmed association of a SNP in an intron of glucokinase regulatory protein (GCKR) with serum
triglycerides. The discovery of associated variants in unsuspected genes and outside coding regions
illustrates the ability of genome-wide association studies to provide potentially important clues to
the pathogenesis of common diseases.
T
ype 2 diabetes, obesity, and cardiovascular
risk factors are caused by a combination
of genetic susceptibility, environment, be-
havior, and chance. Whole-genome association
studies (WGAS) offer a new approach to gene
discovery unbiased with regard to presumed
functions or locations of causal variants. This
approach is based on Fisher’s theory for additive
effects at common alleles (1); human heterozy-
to purifying selection, and has been made pos-
sible by genomic advances such as the human
genome sequence, SNP and HapMap databases,
and genotyping arrays (3).
We studied 1464 patients with T2D and
1467 controls from Finland and Sweden, each
characterized for 18 clinical traits: anthropomet-
ric measures, glucose tolerance and insulin se-
cretion, lipids and apolipoproteins, and blood
applying stringent quality-control filters, high-
quality genotypes for 386,731 common SNPs
were obtained (4). To extend the set of putative
causal alleles tested for association, we devel-
oped 284,968 additional multimarker (haplo-
type) tests based on these SNP genotypes (5, 6).
The 671,699 allelic tests capture (correlation co-
efficient r2
≥ 0.8) 78% of common SNPs in
HapMap CEU (3).
Each SNP and haplotype test was assessed
for association to T2D and each of 18 traits with
the software package PLINK (http://pngu.mgh.
harvard.edu/purcell/plink/). For T2D, a weighted
meta-analysis was used to combine results for
the population-based and family-based subsam-
ples (4). For quantitative traits, multivariable
linear or logistic regression with or without co-
variates was performed (4). Association results
for each SNP, haplotype test, and phenotype are
available (www.broad.mit.edu/diabetes/).
In genome-wide analysis involving hundreds
of thousands of statistical tests, modest levels of
bias imposed on the null distribution can over-
whelm a small number of true results. We used
three strategies to search for evidence of sys-
tematic bias from unrecognized population struc-
ture, the analytical approach, and genotyping
artifacts (7, 8). First, we examined the distribu-
tion of P-values in the population-based sam-
ple, observing a close match to that expected
for a null distribution (genomic inflation factor
lGC = 1.05 for T2D). Second, we calculated
G. Brice,6
B. Bullman,7
J. Campbell,8
B. Castle,9
R. Cetnarsyj,8
C.
Chapman,10
C. Chu,11
N. Coates,12
T. Cole,10
R. Davidson,4
A. Donaldson,13
H. Dorkins,3
F. Douglas,2
D. Eccles,9
R. Eeles,1
F. Elmslie,6
D. G. Evans,7
S. Goff,6
S. Goodman,5
D. Goudie,2
J. Gray,15
L. Greenhalgh,16
H. Gregory,17
S. V. Hodgson,6
T. Homfray,6
R. S. Houlston,1
L. Izatt,18
L. Jackson,18
L. Jeffers,19
V. Johnson-Roffey,12
F. Kavalier,18
C. Kirk,19
F. Lalloo,7
C. Langman,18
I. Locke,1
M. Longmuir,4
J. Mackay,20
A. Magee,19
S. Mansour,6
Z. Miedzybrodzka,17
J. Miller,11
P. Morrison,19
V. Murday,4
J. Paterson,21
G. Pichert,18
M. Porteous,8
N. Rahman,6
M. Rogers,15
S. Rowe,22
S. Shanley,1
A. Saggar,6
G. Scott,2
L. Side,23
L. Snadden,4
M. Steel,2
M. Thomas,5
S. Thomas,1
1
Clinical Genetics Service, Royal Marsden Hospital, Downs
Road, Sutton, Surrey, SM2 5PT, UK. 2
Department of
Clinical Genetics, Ninewells Hospital, Dundee, DD1 9SY,
UK. 3
Medical and Community Genetics, Kennedy-Galton
Centre, Level 8V, Northwick Park and St. Mark’s NHS Trust,
Watford Rd, Harrow, HA1 3UJ, UK. 4
Institute of Medical
Genetics, Yorkhill NHS Trust, Dalnair Street, Glasgow, G3
8SJ, UK. 5
Clinical Genetics Department, Royal Devon and
Exeter Hospital (Heavitree), Gladstone Road, Exeter, EX1
2ED, UK. 6
Department of Clinical Genetics, St. George’s
Hospital Medical School, Jenner Wing, Cranmer Terrace,
London, SW17 0RE, UK. 7
Department of Medical Genetics,
St. Mary’s Hospital, Hathersage Road, Manchester, M13
0JH, UK. 8
South East of Scotland Clinical Genetics Service,
Western General Hospital, Crewe Road, Edinburgh, EH4
2XU, UK. 9
Department of Medical Genetics, The Princess
Anne Hospital, Coxford Road, Southampton, S016 5YA, UK.
10
Clinical Genetics Unit, Birmingham Women’s Hospital,
Metchley Park Road, Edgbaston, Birmingham, B15 2TG,
UK. 11
Yorkshire Regional Genetic Service, Department of
Clinical Genetics, Cancer Genetics Building, St. James
University Hospital, Beckett Street, Leeds, LS9 7TF, UK.
12
Department of Clinical Genetics, Leicester Royal Infirm-
ary, Leicester, LE1 5WW, UK. 13
Department of Clinical
Genetics, St Michael’s Hospital, Southwell Street, Bristol,
BS2 8EG, UK. 14
Institute of Human Genetics, International
Centre for Life, Central Parkway, Newcastle upon Tyne, NE1
3BZ, UK. 15
Institute of Medical Genetics, University
Hospital of Wales, Heath Park, Cardiff, CF14 4XW, UK.
16
Department of Clinical Genetics, Alder Hey Children’s
Hospital, Eaton Road, Liverpool L12 2AP, UK. 17
Clinical
Genetics Centre, Argyll House, Foresterhill, Aberdeen,
AB25 2ZR, UK. 18
Clinical Genetics, 7th Floor New Guy’s
House, Guy’s
UK. 19
Clinical
Belvoir Park H
20
Clinical and
Health, 30 G
21
Department
Trust, Box 13
22
Department
of Chester Ho
23
Department
Road, Headin
Supporting
www.sciencema
Materials and
Figs. S1 to S8
Tables S1 to S
References
9 March 2007
Published onli
10.1126/scien
Include this in
A Genome-Wide Association Study of
Type 2 Diabetes in Finns Detects
Multiple Susceptibility Variants
Laura J. Scott,1
Karen L. Mohlke,2
Lori L. Bonnycastle,3
Cristen J. Willer,1
Yun Li,1
William L. Duren,1
Michael R. Erdos,3
Heather M. Stringham,1
Peter S. Chines,3
Anne U. Jackson,1
Ludmila Prokunina-Olsson,3
Chia-Jen Ding,1
Amy J. Swift,3
Narisu Narisu,3
Tianle Hu,1
Randall Pruim,4
Rui Xiao,1
Xiao-Yi Li,1
Karen N. Conneely,1
Nancy L. Riebow,3
Andrew G. Sprau,3
Maurine Tong,3
Peggy P. White,1
Kurt N. Hetrick,5
Michael W. Barnhart,5
Craig W. Bark,5
Janet L. Goldstein,5
Lee Watkins,5
Fang Xiang,1
Jouko Saramies,6
Thomas A. Buchanan,7
Richard M. Watanabe,8,9
Timo T. Valle,10
Leena Kinnunen,10,11
Gonçalo R. Abecasis,1
Elizabeth W. Pugh,5
Kimberly F. Doheny,5
Richard N. Bergman,9
Jaakko Tuomilehto,10,11,12
Francis S. Collins,3
* Michael Boehnke1
*
Identifying the genetic variants that increase the risk of type 2 diabetes (T2D) in humans has
been a formidable challenge. Adopting a genome-wide association strategy, we genotyped 1161
Finnish T2D cases and 1174 Finnish normal glucose-tolerant (NGT) controls with >315,000
single-nucleotide polymorphisms (SNPs) and imputed genotypes for an additional >2 million
autosomal SNPs. We carried out association analysis with these SNPs to identify genetic variants
that predispose to T2D, compared our T2D association results with the results of two similar studies,
and genotyped 80 SNPs in an additional 1215 Finnish T2D cases and 1258 Finnish NGT controls.
We identify T2D-associated variants in an intergenic region of chromosome 11p12, contribute
to the identification of T2D-associated variants near the genes IGF2BP2 and CDKAL1 and the
ria (8). We
ciation with
the log-odd
(8). We ob
versus 31.6
P values <
against the
with a large
consistent w
SNPs that
also sugges
trols by birt
successful;
genomic co
Analysi
allowed us
variation in
portion, w
(8, 13) that
equilibrium
Centre d’E
(Utah resid
1
Department
Genetics, Uni
USA. 2
Depar
Science, 6/2007
Study design: Richa Saxena1–6
and Valeriya Lyssenko7
(Team
Leaders), Peter Almgren,7
Paul I. W. de Bakker,1–6
Noël P.
Burtt,1
Jose C. Florez,1–6
Hong Chen,8
Joanne Meyer,8
Joel N.
Hirschhorn,1,6,9–11
Mark J. Daly,1–3,5
Thomas E. Hughes,8
Leif
Groop,7,12
David Altshuler1–6
(Chair)
Clinical characterization and phenotypes: Valeriya Lyssenko7
and Richa Saxena1–6
(Team Leaders), Peter Almgren,7
Kristin
Ardlie,1
Kristina Bengtsson Boström,13
Noël P. Burtt,1
Hong Chen,8
Jose C. Florez,1–6
Bo Isomaa,14,15
Sekar Kathiresan,1,3,5
Guillaume
Lettre,1,6,9–11
Ulf Lindblad,16
Helen N. Lyon,1,6,9–11
Olle Melander,7
Christopher Newton-Cheh,1–3,5
Peter Nilsson,17
Marju Orho-
Melander,7
Lennart Råstam,16
Elizabeth K. Speliotes,1,3,6,9–11
Marja-Riitta Taskinen,12
Tiinamaija Tuomi,12,15
Benjamin F.
Voight,1–3,5
David Altshuler,1–6
Joel N. Hirschhorn,1,6,9–11
Thomas
E. Hughes,8
Leif Groop7,12
(Chair)
DNA sample QC and diabetes replication genotyping:
Candace Guiducci1
and Valeriya Lyssenko7
(Team Leaders),
Anna Berglund,7
Joyce Carlson,18
Lauren Gianniny,1
Rachel
Hackett,1
Liselotte Hall,18
Johan Holmkvist,7
Esa Laurila,7
Marju
Orho-Melander,7
Marketa Sjögren,7
Maria Sterner,18
Aarti
Surti1
Margareta Svensson,7
Malin Svensson,7
Ryan Tewhey,1
Noël P. Burtt1
(Chair)
Whole genome scan genotyping: Brendan Blumenstiel1
(Team Leader), Melissa Parkin,1
Matthew DeFelice,1
Candace
Guiducci,1
Ryan Tewhey,1
Rachel Barry,1
Wendy Brodeur,1
Noël
P. Burtt,1
Jody Camarata,1
Nancy Chia,1
Mary Fava,1
John
Gibbons,1
Bob Handsaker,1
Claire Healy,1
Kieu Nguyen,1
Casey
Gates,1
Carrie Sougnez,1
Diane Gage,1
Marcia Nizzari,1
David
Altshuler,1–6
Stacey B. Gabriel1
(Chair)
GCKR replication genotyping and analysis (Malmö Diet
and Cancer Study): Sekar Kathiresan1,3,5
(Team Leader),
Candace Guiducci,1
Aarti Surti,1
Noël P. Burtt,1
Olle Melander,7
Marju Orho-Melander7
(Chair)
Statistical analysis: Benjamin F. Voight1–3,5
and Paul I. W.
de Bakker1–6
(Team Leaders), Richa Saxena,1–6
Valeriya
Lyssenko,7
Peter Almgren,7
Noël P. Burtt,1
Hong Chen,8
Gung-Wei
Chirn,8
Qicheng Ma,8
Hemang Parikh,7
Delwood Richardson,8
Darrell Ricke,8
Jeffrey J. Roix,8
Leif Groop,7,12
Shaun Purcell,1,2
David Altshuler,1–6
Mark J. Daly1–3,5
(Chair)
1
Broad Institute of Harvard and Massachusetts Institute of
Technology (MIT), Cambridge, MA 02142, USA. 2
Center for
Human Genetic Research, Massachusetts General Hospital,
Boston, MA 02114, USA. 3
Department of Medicine, Mas-
sachusetts General Hospital, Boston, MA 02114, USA.
4
Department of Molecular Biology, Massachusetts General
Hospital, Boston, MA 02114, USA. 5
Department of Medicine,
Harvard Medical School, Boston, MA 02115, USA. 6
Depart-
ment of Genetics, Harvard Medical School, Boston, MA
02115, USA. 7
Department of Clinical Sciences, Diabetes and
Endocrinology Research Unit, University Hospital Malmö,
Lund University, Malmö, Sweden. 8
Diabetes and Metabolism
Disease Area, Novartis Institutes for BioMedical Research, 100
Technology Square, Cambridge, MA 02139, USA. 9
Depart-
ment of Pediatrics, Harvard Medical School, Boston, MA
02115, USA. 10
Division of Endocrinology, Children’s Hospital,
Boston, MA 02115, USA. 11
Division of Genetics, Children’s
Hospital, Boston, MA 02115, USA. 12
Department of Medicine,
Helsinki University Hospital, University of Helsinki, Helsinki,
Finland. 13
Skaraborg Institute, Skövde, Sweden. 14
Malmska
Municipal Health Center and Hospital, Jakobstad, Finland.
15
Folkhälsan Research Center, Helsinki, Finland. 16
Depart-
ment of Clinical Sciences, Community Medicine Research
Unit, University Hospital Malmö, Lund University, Malmö,
Sweden. 17
Department of Clinical Sciences, Medicine Research
Unit, University Hospital Malmö, Lund University, Malmö, Sweden.
18
Clinical Chemistry, University Hospital Malmö, Lund
University, Malmö, Sweden. 19
Department of Psychiatry,
Massachusetts General Hospital, Harvard Medical School,
Boston, MA 02115, USA.
Supporting Online Material
www.sciencemag.org/cgi/content/full/1142358/DC1
Materials and Methods
Figs. S1 and S2
Tables S1 to S6
References
9 March 2007; accepted 20 April 2007
Published online 26 April 2007;
10.1126/science.1142358
Include this information when citing this paper.
Replication of Genome-Wide
Association Signals in UK Samples
Reveals Risk Loci for Type 2 Diabetes
Eleftheria Zeggini,1,2
* Michael N. Weedon,3,4
* Cecilia M. Lindgren,1,2
* Timothy M. Frayling,3,4
*
Katherine S. Elliott,2
Hana Lango,3,4
Nicholas J. Timpson,2,5
John R. B. Perry,3,4
Nigel W. Rayner,1,2
Rachel M. Freathy,3,4
Jeffrey C. Barrett,2
Beverley Shields,4
Andrew P. Morris,2
Sian Ellard,4,6
Christopher J. Groves,1
Lorna W. Harries,4
Jonathan L. Marchini,7
Katharine R. Owen,1
Beatrice Knight,4
Lon R. Cardon,2
Mark Walker,8
Graham A. Hitman,9
Andrew D. Morris,10
Alex S. F. Doney,10
The Wellcome Trust Case Control
Consortium (WTCCC),† Mark I. McCarthy,1,2
ठAndrew T. Hattersley3,4
‡
The molecular mechanisms involved in the development of type 2 diabetes are poorly
understood. Starting from genome-wide genotype data for 1924 diabetic cases and 2938
population controls generated by the Wellcome Trust Case Control Consortium, we set out to detect
replicated diabetes association signals through analysis of 3757 additional cases and 5346 controls
and by integration of our findings with equivalent data from other international consortia. We
detected diabetes susceptibility loci in and around the genes CDKAL1, CDKN2A/CDKN2B, and
IGF2BP2 and confirmed the recently described associations at HHEX/IDE and SLC30A8. Our findings
provide insight into the genetic architecture of type 2 diabetes, emphasizing the contribution of
Here, we describe how integration of data
from the WTCCC scan and our own replication
studies with similar information generated by the
Diabetes Genetics Initiative (DGI) (6) and the
Finland–United States Investigation of NIDDM
Genetics (FUSION) (7) has identified several
additional susceptibility variants for T2D.
In the WTCCC study, analysis of 490,032
autosomal SNPs in 16,179 samples yielded
459,448 SNPs that passed initial quality control
(5). We considered only the 393,453 autosomal
SNPs with minor allele frequency (MAF) ex-
ceeding 1% in both cases and controls and no
extreme departure from Hardy-Weinberg equi-
librium (P < 10−4
in cases or controls) (8). This
T2D-specific data set shows no evidence of sub-
stantial confounding from population substruc-
ture and genotyping biases (8).
To distinguish true associations from those
reflecting fluctuations under the null or residual
errors arising from aberrant allele calling, we first
submitted putative signals from the WTCCC study
to additional quality control, including cluster-
plot visualization and validation genotyping on
REPORTS
onFebruary8,2010www.sciencemag.orgDownloadedfrom
ARTICLES
A genome-wide association study
identifies novel risk loci for type 2 diabetes
Robert Sladek1,2,4
, Ghislain Rocheleau1
*, Johan Rung4
*, Christian Dina5
*, Lishuang Shen1
, David Serre1
,
Philippe Boutin5
, Daniel Vincent4
, Alexandre Belisle4
, Samy Hadjadj6
, Beverley Balkau7
, Barbara Heude7
,
Guillaume Charpentier8
, Thomas J. Hudson4,9
, Alexandre Montpetit4
, Alexey V. Pshezhetsky10
, Marc Prentki10,11
,
Barry I. Posner2,12
, David J. Balding13
, David Meyre5
, Constantin Polychronakos1,3
& Philippe Froguel5,14
Type 2 diabetes mellitus results from the interaction of environmental factors with a combination of genetic variants, most of
which were hitherto unknown. A systematic search for these variants was recently made possible by the development of
high-density arrays that permit the genotyping of hundreds of thousands of polymorphisms. We tested 392,935
single-nucleotide polymorphisms in a French case–control cohort. Markers with the most significant difference in genotype
frequencies between cases of type 2 diabetes and controls were fast-tracked for testing in a second cohort. This identified
four loci containing variants that confer type 2 diabetes risk, in addition to confirming the known association with the TCF7L2
gene. These loci include a non-synonymous polymorphism in the zinc transporter SLC30A8, which is expressed exclusively in
insulin-producing b-cells, and two linkage disequilibrium blocks that contain genes potentially involved in b-cell
development or function (IDE–KIF11–HHEX and EXT2–ALX4). These associations explain a substantial portion of disease risk
and constitute proof of principle for the genome-wide approach to the elucidation of complex genetic traits.
The rapidly increasing prevalence of type 2 diabetes mellitus (T2DM) is
thought to be due to environmental factors, such as increased availabil-
ity of food and decreased opportunity and motivation for physical
activity, acting on genetically susceptible individuals. The heritability
of T2DM is one of the best established among common diseases and,
consequently, genetic risk factors for T2DM have been the subject of
intense research1
. Although the genetic causes of many monogenic
forms of diabetes (maturity onset diabetes in the young, neonatal mito-
chondrial and other syndromic types of diabetes mellitus) have been
elucidated, few variants leading to common T2DM have been clearly
identified and individually confer only a small risk (odds ratio < 1.1–
1.25) of developing T2DM1
. Linkage studies have reported many
T2DM-linked chromosomal regions and have identified putative, cau-
sative genetic variants in CAPN10 (ref. 2), ENPP1 (ref. 3), HNF4A (refs
genotypes for 392,935 single-nucleotide polymorphisms (SNPs) in
1,363 T2DM cases and controls (Supplementary Table 1). In order to
enrich for risk alleles21
, the diabetic subjects studied in stage 1 were
selected to have at least one affected first degree relative and age at
onset under 45 yr (excluding patients with maturity onset diabetes in
the young). Furthermore, in order to decrease phenotypic hetero-
geneity and to enrich for variants determining insulin resistance and
b-cell dysfunction through mechanisms other than severe obesity, we
initially studied diabetic patients with a body mass index (BMI)
,30 kg m22
. Control subjects were selected to have fasting blood
glucose ,5.7 mmol l21
in DESIR, a large prospective cohort for the
study of insulin resistance in French subjects22
.
Genotypes for each study subject were obtained using two plat-
Sladek, 2007How many SNPs (p-value?)
European-based; N ~ 1000
cases: high fasting blood glucose/non-obese

controls: non-obese
Human Hap300 chip, showing no T2DM association in stage 1
(P . 0.01) and separated by at least 100 kb. Using the first principal
component as a covariate for ancestry differences between cases and
controls, we tested for association between rs932206 and disease
status. Our result suggests that this apparent association is largely
BMI on the association between marker and disease, as it is asymp-
totically equivalent to the Armitage trend test used to detect asso-
ciation in stages 1 and 2. None of the associations (Supplementary
Table 7) was substantially changed by considering the effects of these
covariates.
5
3
1
5
3
1
5
3
1
5
3
1
5
3
1
15
10
5
5
3
1
5
3
1
5
3
1
5
3
1
5
3
1
5
3
1
5
3
1
5
3
1
5
3
1
5
3
1
5
3
1
5
3
1
5
3
1
5
3
1
5
3
1
5
3
1
5
3
1
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 19 20
21 22 X
18
Figure 1 | Graphical summary of stage 1 association results. T2DM
association was determined for SNPs on the Human1 and Hap300 chips. The
x axis represents the chromosome position from pter; the y axis shows
2log10[pMAX], the P-value obtained by the MAX statistic, for each SNP
(Note the different scale on the y axis of the chromosome 10 plot.). SNPs that
passed the cutoff for a fast-tracked second stage are highlighted in red.
882
Nature©2007 Publishing Group Sladek, 2007
Identification of four novel T2DM loci
Our fast-track stage 2 genotyping confirmed the reported association
for rs7903146 (TCF7L2) on chromosome 10, and in addition iden-
tified significant associations for seven SNPs representing four new
T2DM loci (Table 1). In all cases, the strongest association for the
MAX statistic (see Methods) was obtained with the additive model.
The most significant of these corresponds to rs13266634, a non-
synonymous SNP (R325W) in SLC30A8, located in a 33-kb linkage
disequilibrium block on chromosome 8, containing only the 39 end
of this gene (Fig. 2a). SLC30A8 encodes a zinc transporter expressed
solely in the secretory vesicles of b-cells and is thus implicated in the
final stages of insulin biosynthesis, which involve co-crystallization
Table 1 | Confirmed association results
SNP Chromosome Position
(nucleotides)
Risk
allele
Major
allele
MAF
(case)
MAF
(ctrl)
Odds ratio
(het)
Odds ratio
(hom)
PAR ls Stage 2
pMAX
Stage 2 pMAX
(perm)
Stage 1
pMAX
Stage 1 pMAX
(perm)
Nearest
gene
rs7903146 10 114,748,339 T C 0.406 0.293 1.65 6 0.19 2.77 6 0.50 0.28 1.0546 1.5 3 10234
,1.0 3 1027
3.2 3 10217
,3.3 3 10210
TCF7L2
rs13266634 8 118,253,964 C C 0.254 0.301 1.18 6 0.25 1.53 6 0.31 0.24 1.0089 6.1 3 1028
5.0 3 1027
2.1 3 1025
1.8 3 1025
SLC30A8
rs1111875 10 94,452,862 G G 0.358 0.402 1.19 6 0.19 1.44 6 0.24 0.19 1.0069 3.0 3 1026
7.4 3 1026
9.1 3 1026
7.3 3 1026
HHEX
rs7923837 10 94,471,897 G G 0.335 0.377 1.22 6 0.21 1.45 6 0.25 0.20 1.0065 7.5 3 1026
2.2 3 1025
3.4 3 1026
2.5 3 1026
HHEX
rs7480010 11 42,203,294 G A 0.336 0.301 1.14 6 0.13 1.40 6 0.25 0.08 1.0041 1.1 3 1024
2.9 3 1024
1.5 3 1025
1.2 3 1025
LOC387761
rs3740878 11 44,214,378 A A 0.240 0.272 1.26 6 0.29 1.46 6 0.33 0.24 1.0046 1.2 3 1024
2.8 3 1024
1.8 3 1025
1.3 3 1025
EXT2
rs11037909 11 44,212,190 T T 0.240 0.271 1.27 6 0.30 1.47 6 0.33 0.25 1.0045 1.8 3 1024
4.5 3 1024
1.8 3 1025
1.3 3 1025
EXT2
rs1113132 11 44,209,979 C C 0.237 0.267 1.15 6 0.27 1.36 6 0.31 0.19 1.0044 3.3 3 1024
8.1 3 1024
3.7 3 1025
2.9 3 1025
EXT2
Significant T2DM associations were confirmed for eight SNPs in five loci. Allele frequencies, odds ratios (with 95% confidence intervals) and PAR were calculated using only the stage 2 data. Allele
frequencies in the controls were very close to those reported for the CEU set (European subjects genotyped in the HapMap project). Induced sibling recurrent risk ratios (ls) were estimated using
stage 2 genotype counts for the control subjects and assuming a T2DM prevalence of 7% in the French population. hom, homozygous; het, heterozygous; major allele, the allele with the higher
frequency in controls; pMAX, P-value of the MAX statistic from the x2
distribution; pMAX (perm), P-value of the MAX statistic from the permutation-derived empirical distribution (pMAX and
pMAX (perm) are adjusted for variance inflation); risk allele, the allele with higher frequency in cases compared with controls.
0
2
4
–log10[P]
–log10[P]
SLC30A8 IDE HHEXKIF11
0
2
4
a b
NATURE|Vol 445|22 February 2007 ARTICLES
Sladek, 2007
5
3
1
5
3
1
15
10
5
1 1 1
5
3
1
5
3
1
5
3
1
5
3
1
5
3
1
5
3
1
5
3
1
3 4 5
8 9 10
13 14 15
19 20
X
18
DM 2log10[pMAX], the P-value obtained by the MAX statistic, for each SNP
How would you interpret the p-
values?
Odds ratios?
Confirmed 8 SNPs with N ~ 1000
Scaling up discovery by combining populations:

meta-analyses
g the Diabetes Genetics
nvestigation of NIDDM
nd (iv) the Framingham
omponent studies (n ¼
ry Table 1 online.
aring, the four consortia
n 10 and 20 SNPs promi-
their individual, interim,
mentary Table 2 online).
oci with consistent effects
dies. Two of these repre-
6PC2 and GCK. In addi-
nerated evidence for an
NPs around the MTNR1B
rs1387153, P ¼ 2.2 Â
10À11; DFS: rs10830963,
5.8 Â 10À4, for the most
ch analysis). The associa-
d on formal meta-analysis
r exclusion of individuals
¼ 1.1 Â 10À57; rs4607517
NR1B), P ¼ 3.2 Â 10À50;
pplementary Table 3 and
ent efforts to harmonize
(including the additional
data from the WTCCC, DGI and FUSION scans)10 (Supplementary
Note). We found strong evidence that the minor G allele of
rs10830963 was associated with increased risk of T2D (odds ratio ¼
1.09 (1.05–1.12), P ¼ 3.3 Â 10À7; Fig. 2 and Supplementary Table 6
online). The possibility that the fasting glucose association might
DGI
Study ID OR (95% CI) Weight
(%)
1.12 (0.96, 1.30) 4.61
4.89
8.03
9.58
3.53
8.75
2.69
6.04
10.56
23.18
2.85
7.41
7.90
100.00
1.20 (1.03, 1.39)
1.07 (0.95, 1.20)
1.14 (1.03, 1.27)
1.00 (0.84, 1.19)
1.17 (1.04, 1.30)
1.07 (0.88, 1.31)
1.16 (1.02, 1.33)
1.00 (0.90, 1.10)
1.03 (0.96, 1.10)
0.91 (0.75, 1.10)
1.15 (1.02, 1.30)
1.16 (1.03, 1.30)
1.09 (1.05, 1.12)
Meta-analysis P value = 3.3 × 10
–7
FUSION
WTCCC
deCODE
KORA
Rotterdam
CCC
ADDITION/ELY
Norfolk
UKT2DGC
OxGN/58BC
FUSION Stage 2
METSIM
.722 1 1.39
Overall (I
2
= 26.6%, P = 0.176)
Figure 2 Association of rs10830963 with type 2 diabetes (T2D) in 13 case-
control studies.
VOLUME 41 [ NUMBER 1 [ JANUARY 2009 NATURE GENETICS
Meta-analysis of SNP rs10830963:
Combining findings from multiple cohorts
Propenko, 2009
A RT I C L E S
By combining genome-wide association data from 8,130 individuals with type 2 diabetes (T2D) and 38,987 controls of
European descent and following up previously unidentified meta-analysis signals in a further 34,412 cases and 59,925 controls,
we identified 12 new T2D association signals with combined P < 5 × 10−8. These include a second independent signal at the
KCNQ1 locus; the first report, to our knowledge, of an X-chromosomal association (near DUSP9); and a further instance of
overlap between loci implicated in monogenic and multifactorial forms of diabetes (at HNF1A). The identified loci affect both
beta-cell function and insulin action, and, overall, T2D association signals show evidence of enrichment for genes involved in
cell cycle regulation. We also show that a high proportion of T2D susceptibility loci harbor independent association signals
influencing apparently unrelated complex traits.
Type 2 diabetes (T2D) is characterized by insulin resistance and
deficient beta-cell function1. The escalating prevalence of T2D and
the limitations of currently available preventative and therapeutic
options highlight the need for a more complete understanding of
T2D pathogenesis. To date, approximately 25 genome-wide significant
common variant associations with T2D have been described, mostly
through genome-wide association (GWA) analyses2–13. The identities
of the variants and genes mediating the susceptibility effects at most
of these signals have yet to be established, and the known variants
account for less than 10% of the overall estimated genetic contribution
to T2D predisposition. Although some of the unexplained heritability
will reflect variants poorly captured by existing GWA platforms, we
reasoned that an expanded meta-analysis of existing GWA data would
the inverse-variance method (Online Methods, Fig. 1, Supplementary
Tables 1 and 2 and Supplementary Note). We observed only modest
genomic control inflation ( gc = 1.07), suggesting that the observed
results were not due to population stratification. After removing SNPs
within established T2D loci (Supplementary Table 3), the result-
ing quantile-quantile plot was consistent with a modest excess of
disease associations of relatively small effect (Supplementary Note).
Weak evidence for association at HLA variants strongly associated
with autoimmune forms of diabetes (Supplementary Table 3 and
Supplementary Note) suggested some case admixture involving
subjects with type 1 diabetes or latent autoimmune diabetes of adult-
hood; however, failure to detect T2D associations at other non-HLA
type 1 diabetes susceptibility loci (for example, INS, PTPN22 and
Twelve type 2 diabetes susceptibility loci identified
through large-scale association analysis
Voight, 2010
Meta-analyses for T2D:
N>40K and 90K identifies >30 loci among 2,400,000 SNPs
A RT I C L E S
13 autosomal loci exceeded the threshold for genome-wide significance
(P ranging from 2.8 × 10−8 to 1.4 × 10−22) with allele-specific odds
(r2 < 0.05), and conditional analyses (see below) establish these SNPs
as independent (Fig. 2 and Supplementary Table 4). Further analysis
50 Locus established previously
Locus identified by current study
Locus not confirmed by current study
BCL11A
THADA
NOTCH2
ADAMTS9
IRS1
IGF2BP2
WFS1
ZBED3
CDKAL1
HHEX/IDE
KCNQ1 (2 signals*: )
TCF7L2
KCNJ11
CENTD2
MTNR1B
HMGA2 ZFAND6
PRC1
FTO
HNF1B DUSP9
Conditional analysis
Unconditional analysis
TSPAN8/LGR5
HNF1A
CDC123/CAMK1D
CHCHD9
CDKN2A/2B
SLC30A8
TP53INP1
JAZF1
KLF14
PPAR
40
30
–log10(P)–log10(P)
20
10
10
1 2 3 4 5 6 7 8
Chromosome
9 10 11 12 13 14 15 16 17 18 19 20 21 22 X
0
0
Suggestive statistical association (P < 1 10
–5
)
Association in identified or established region (P < 1 10
–4
)
Figure 1 Genome-wide Manhattan plots for the DIAGRAM+ stage 1 meta-analysis. Top panel summarizes the results of the unconditional meta-
analysis. Previously established loci are denoted in red and loci identified by the current study are denoted in green. The ten signals in blue are those
taken forward but not confirmed in stage 2 analyses. The genes used to name signals have been chosen on the basis of proximity to the index SNP and
should not be presumed to indicate causality. The lower panel summarizes the results of equivalent meta-analysis after conditioning on 30 previously
established and newly identified autosomal T2D-associated SNPs (denoted by the dotted lines below these loci in the upper panel). Newly discovered
conditional signals (outside established loci) are denoted with an orange dot if they show suggestive levels of significance (P < 10−5), whereas
secondary signals close to already confirmed T2D loci are shown in purple (P < 10−4).
Meta-analyses for T2D:
N>40K and 90K identifies >30 loci among 2,400,000 SNPs
0
20
40
60
80
100
recombinationrate(cM/Mb)
●●●
●●
●●
●●●
●
●
●
●●●
●
●●●●●
●
●
●
●●●
●●
●● ●
●
●●●
●●
●
●
●
●
●
●
●●
●
●
●●
●● ●
●
●●
●●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●●●●●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●●●
●●●
●
●
●
●
●
●
●●●●●
●●●●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●●●
●●
●●
●
●●
●
●●
●
●
● ●
●●●●
●
●
●
●
●
●●
●
●● ●●
●● ●
●
●
●
●
● ●
●●
●
●●●●
●
●
●
●●
●
●●
●
●●●
●
●
●
●
●
●●●●
●
● ●● ●
●
●●●●●
●
●
2 −>
PGCP
98
SLC30A8 Region
0
2
4
6
8
10
−log10(P−value)
0
20
40
60
80
100
recombinationrate(cM/Mb)
rs3802177
●●●●
●
● ●
●
●
●
●
● ●
●●
●
●●
●●● ●
●
●
●
●●●
●●
●
●●●●●●
●
●●●
●
●
●
●
●
●
●●
●●●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
● ●
● ●
●● ●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●●● ●● ●●
●
●
●
●
●
● ●
●
●
● ●
●
●
●● ●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●● ●
●● ●
●
●●
●●
●
●●
●●
●
● ●
●
● ●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●●
● ●● ●●
●
●
●
●●
●
●●
●
●
●
● ●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
● ●●●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
● ● ●
● ●
●
●
● ●
●
●
●
●●
● ●
●
●
●
●
●
●● ●
●● ●●●
●
●
●
●
●●●●●
●
●
●
●●
●● ●
●
●
●
● ●
● ●
●
●
●
●
●
●● ●
●●
●
●
●
●
●
●
●
●●●
●● ● ●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●
●
●●●
●●
●● ●
●●
●
●●● ●
● ●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●●
●
●
●
● ●
●●
●
●
●●
● ●
● ●
● ●
●
●●
●
●
●
●
●●●
●
●
●
●
● ●●
●
●
●
●●
●
●
●
● ●
●
●●●●
●●
●
●
●●
●●●
●
●●●●●
●●
●●●
●
●●●
●
●
●
●
●●●
●●
●
●
●
●●●●●
●
●
●
●
●●
●
●●●
●
●
● ●
●
●
●●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●●
●
●
● ●
●●●
●
●●●
●
●
●●
●
●
●
●
●●
●
●
●●● ●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●●●
●
●
●
●
●
●
●
●
●●
● ●
●
●● ●
●
●
●
●
● ●●●●
●
●
●
●
●
●
●
● ●
●
●●
● ●● ●
●
●
●
●●
●
●
●●● ●●
●
●
●
●
●●●
●
●
●
●
●●
●
● ●●
●
● ●●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
● ●●●● ●●●
●
●
●
●●
●
● ●
●
●
●
●●
●
● ●
●
●
● ●●●
●
●
●● ●
●
●
●
●
●●
●
●
●
●
●● ●●
●●
●
●
●
●
●
● ●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●●● ●
●
●
●
● ●
●
●
●
●● ● ●
● ●●
●
●
●
●
●
●
●
●
●● ●
●
●
●●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●● ●
●●
●● ●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
● ●
●
● ●●
●
●●
●
●●
● ●
●● ●
●
●●
●
●●● ●
●●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●●
●●
●
●
●●
●
●
●
● ● ●
●
●
●
●
●
●
●●
●
●
● ● ●
●
●
●
●●
●
●
●
●
●
●●
●
●
● ●
●
●
●● ●●
●
●● ●●●
●●
●●●●●●
●
●
●
●●
●●
●
●
●
●
●●●
●
●
●
●●
●
●●
●
●●
●●●●●
●
●
●
●●● ●
●
●●
●
●●
●
●● ●
●●
●
●
●
●
●
●
●
●
●● ●●●
●
●● ●●●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●●
●
● ●●●●
●●
●●
●●●
●
●
●
●●●●●
●
●●
●
●
●
●
●●
●
● ●● ●●●●●●●●●
●●●
●
●●●
●
●● ●
●●●
●
●
●
●
●
●
●● ●
●
●
●
●● ●●
●
●●
●
●●●●●● ●
●
● ●
●
●
●
●
●
●●
● ●
●
●●
●
●
●
●
●
●
● ●● ●
●
●
●
●
●
●●
●
●
●
●●●
●
●●●●●
●
●
●●●
●
●●●● ●
●●
●●
● ●
●●● ●
●
●●●●●●●
●
●
●
●
●
●
●●
●
●●
●
●●
●●●●●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●●
●●
●●
●
●●●●
●●●
●
●● ●
●
●
●
●●●
●
●●●
●
●●
●
●●●
●
●●●●●●●●●●
●
●
●
●
●●●●
●
●●
●●●●●●●●●●●●●
●
●●●
●
●●
●● ●
● ●●
●●
●
●●●●●
●
●
●
●●
●●
●
●
●●●●●●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●
●
●●●●●
●
●
● ●
●
●
●
●
●
●
●●
●
●●
●
●
●●●●
●
●●
●
●●● ●
●
●
●
●●●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●●●●●
●
●
●●
●●
●
●●●●●
●
●
●●●
●●
●●●
●
●
●
●
●●
●
●
●
● ●●
●
● ●●
●
●
● ●●
●
●
●
●
●
●
●
●●
●●●
●
●●
●●
●●●●
●
●
● ●●●
●
●
●●●
●
●
●
●
●●
●
●
●●●●● ●
●● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●● ●●●●
●
●
●
●● ●
●●●●
●●
● ●
●
●●●●
●● ●
●
●
●
●●
●
● ●●
●
●●
● ●
●
●
●
●●●
● ●●
●●●
●
● ●●●
●
●
●●●●●
●
●
●
●
●●●●●
●
●●●●●
●
●●●
●
●
●●
●
●
●
●
●●●
●●
●●●
●● ●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●● ●
●
●
●
●
●●●
●
●
●
●●
●
●
● ●
●
●
●
●●
●●
●
●●
●
●
● ●●●
●
●
●
●
●
●
● ● ●
●
● ● ●● ●
●
●
● ●
●●
●
●
●●●● ● ●●●
●●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●●
●
●
● ●
● ● ●
●
●
●
●●
●
●
●●
●
●●●
●
●●●
●
●●●●●●● ●
●
●
●
●
●
●●●●●●●● ●●
● ●
●
● ●●●●●● ● ●
●●
●
●●
●●● ●
●
●
● ●
●
●
●●●● ●●
●
●
●●●
●●●
●
●●●●
●
●●●●●●
● ● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●●●●●●●●●●●●●
●●●●●●● ● ●
●●●●●●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●● ●●●
●
●●
●
●●●●
●● ●
●
●
●
●
●
●●●●
●
●●
●
●
●
●
●
●●
●●●●● ●
●
●
●●
●
●●●●●●●●●●●●●
●●●●●●●●●●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●●●●●●
●
●● ●
●●●●●●●
●
●●
●●●●
●
●●●●
●
● ●
●●●●●●
●
●●
●●●●●●●●●●●
●●● ● ●
●
●●●●●●
●
●●
● ●●●●●●
●●●●●
●
●
●
●
● ●●●●●●●●●●●●●●●●●●●
●●
●
●
●
●●
●
●
●
●
●●●●
●●
●
●●●
●●
●●●
●
●●
●●
●
●●
●
●
●●●●●
●
●
●
●●
●●
●●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●●● ●
●●●
●
●●●●●●●●
●
●●●●
●
●
●●●
●
●●
●
●●●
● ●●●●
●
●●
●●●
●
●●●●●
●●●●
●●
●●●
●
●
●
●
●
●
●●●●
● ●
●
●●●
●
●
●
●
●
●
●
●
●●●●●●●●●●●
●
●
●●●●●
●
●
●●●●●
●
●●●●
● ●●
●
●●●●●
●
●●●● ●●
●
●●
●
●
●
●●
●●●●●●●●●●●●●
● ●
●●●●●●●
●●●●
●
●●
●●
●●●
●
●
●● ●●●
●
●●●●
●
●
●●●
●●●●●●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●●●
●●●●●
●
●●●●●●●●●●●
●
●●●●●●●
●●●●●●●●
●
●
● ●
●●
●
●
●
●●●
●●
●
●
●●●●●●●●●●●●●●●●
●●●●●
●●●●●
●
●
●
●
●
●
●●
●●
●
●
●
●●●
● ●
●●
●
●
●
●
● ●● ●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●●●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
● ●●
●
●
●
●
●
●
●
● ●
● ●
● ●●
● ●
●
●
●
●●
●
● ●
●● ●●
●
●
●
●
●
●●
● ●
●
●
●● ●
●
●
●
●
●
●● ●
●
●●
●●
● ●
●
●
●●
● ●● ●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●●
● ●●
●
● ●●
●
●
●●
●
●●
●
●● ●
●
●
●
●●
●
●
● ●● ●
●●●
●
●
●
●
●
● ●● ● ●
●
● ●
●
●● ●●●●●●●●●
●
●●●●
●●
●●●
●●
●●
●●●
●
●●
●
●
●
●●●●
●
●
●
●
●
● ●
●●
●
●
●●●
●
●●
●
●●
●
●
●
●●●
●
●
●●●●●●●●
●
●●●●
●●
● ●●
●●
●
●●●●●●●
●●●●
●
●
●●
●●●
● ●●●
●●●
●
●●
●
●
● ●●
●
●●●●
●
●
●
●
●●●
●
●●●●●●●●
●
● ●
●
●●
●
●
●
●●
●
●
●●
●
●● ●●
●
●
●
●●●●
●
●
●
●
●●
●
●●
●●
●
●
●● ●
●●●●
●●
●●
●
●
●
●
●
● ●● ● ●●●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●●●●
●
●●
●
●
●
●
●● ●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
● ●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
● ● ●
●
●
●
●
●
●
●
● ●● ●
●●
●
● ●●●●
●
●
●● ●
●
●●
●●
●
●
● ●
●
●
●
●
●● ●
●
●
●
●
●
●
● ● ●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●●
●
●
●
●
● ●●
●
●
●
●
●●
● ● ●
●
●
●
● ●
●
●●
●
●
●
● ●
●
● ●●● ●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
● ●
●
●● ●
●
●
●
●●
●
●
● ● ●
●●
●
●
●
●●
●
●
● ●
●
●
●
●
●●
● ●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
● ●
●●●
● ●
●
●
●
●
●●
● ●
●●
●●
● ● ●
● ●●
●
●● ●●
●
● ● ●
● ●
● ●●
●
●
● ●
●●
●●
●
●●
●●●●●●●●
●
●
●●●●●●●
●
●●●
●
●
●●●●●
● ●● ●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●● ● ●
●
●
●
●
●●
●
●
●●● ●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●● ●
●
●● ●●
●
●●
● ●
● ●● ●
●
● ●●
●
●
●●
●
●
●
●
●
●
●
●
● ●●
●
●● ●● ●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
● ●●
●
●
●
●
●
● ●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●● ●
● ●●
●
●
●
●●
● ●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●●●
●
●
●
●●●
● ●
●
● ●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●●●
●
●
● ●
●
●●
●
●●
●
●●●●●●●●●●
●●●●●
●●
●●●
●●●
●
●
●●●●
●●●●●●●●●●
●
●
●
●
●●
●●●●●
●●●●●●●●●●
●●●●●
●
●
●
●
●
●
●●●●●●●●
●
●
●
●●●●
●●●●
●●●
● ●
●●
●
●
●●
●
●
●
●●●●● ●●
●
●
●
●
●
●
●
●●●●
●
●●●
●
● ●●
●
●
●●
●
●
●
●● ●
●●
●●● ●
● ●
●
●●●
●●
●
●●
●
●
●
●
●
● ●●
●
●
● ● ●
●
●
●
●●
●
●
●
● ● ●●
●
● ● ●
●
●
●●●●
● ●
●
● ●
●
●
● ●● ● ●● ●
●
●
●
●
●
●●
●
●
●
● ●
●● ●●●●
●●
●
●
●● ●
●
●●
● ●
●
●
●
●
●● ●●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●●
●●
●
●
●
rs3802177 stage 1
● r^2: 0.8 − 1.0
● r^2: 0.6 − 0.8
● r^2: 0.4 − 0.6
● r^2: 0.2 − 0.4
● r^2: 0.0 − 0.2
● r^2 missing
<− TRPS1
<− EIF3H
UTP23 −>
<− RAD21
LOC441376 −>
SLC30A8 −>
MED30 −>
<− EXT1
<− SAMD12
<− TNFRSF11
COLEC1
117 118 119 120
Position on chromosome 8 (Mb)
CDKN2A/B Region
0
2
4
6
8
10
−log10(P−value)
0
20
40
60
80
100
recombinationrate(cM/Mb)
rs10965250
●● ●● ●
●
●
●
●
●
●
●
●
●
●
● ●●●
●
●
●●
●
●●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
● ●
●●
●
●
● ●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●●
●
●●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●●
●
●
●
●
●
●●
●● ●
●
●
●●
●
●
●●
● ●●
●
●
●
● ●●
●
●
●
●
●
●
● ●●
●
●●
●
●
●
●
●●
●
●
●
●
●● ●
●
●● ● ●
●
●
●
●
●
●
●
● ●
●
●●
●●
●● ●
●
●
●
●
●
●
●
●●
●
●●●●
●
●
● ●
●●
●
●
●
●
●
● ● ●
●●
●
●
●
●
●●●●
●
●●
●
●●
●
●
●
●●●
●
●●●
● ●
●
● ●●●
●
●●●
●
●
●
●
●●●●
●●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
● ●
●
● ●●
●
●
●
● ●
● ●●●●
●
●●
●
●
●
●
●
● ●●
●
● ●●●●●
●
●●
●
●
●
●
●
● ●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●●
●●●
●
●●
●●
●
●
●●
●●●
●●
●
●
●●
●
●
●●
● ● ●
●
● ●
●●●●●●●●●
●●
●●
●
●
●
●
●
●
●
●●
●
●
● ●●●●●●●
●●●
●
●
● ●●
●
●
●●●●
●
●
●
●●
●
●
●
●
●●●●●
●
●●
●●●●●●
●
●
●
●●
●
●
●●●
●
● ●
●●●
●
●●●●
●
●
●
●●●●
●●
●●●
●●
●●●●●
●●
●●●
●●●●●
●
●●●●
●
●
●
●●
●
●
●
●
●●●
●
●
●●
●●
●●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●●●●●●●
●●●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●●
●●●●●●●●●●
●
●●●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●●●●●
● ●●
●●
●
●
●●●
●●
●
●●
●
● ●
●
●
●●●
●
●●●
●
●●●
●
●
●
●
●●●●●●●●●●●●●
●
●●
●●●
●●●
●●●
●
●
●
●●●●
●●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●●●
●●
●●
●●●●●●●●●●●●●●●
●
●●●
●●●●●
●
●
●
●
●
●
●
●●●●
●●
●
●
●
●
●
●●
●
●●●
●
●
●●
●●●●●
●
●●
●
●
●
●
●●●●●●●
●
●
●
●
●
●●●
●●
●
●●●
●
●●●
●
●●●●●●●●●●●●●●●●
●●●●
●●
●
●●
●●
●●
●
●
●
●
●
●●
●
●●
●
●●●
●
●●●
●
●●●●●
●
●●
●
●●●
●●
●●
●
●
●●●
●●
●●●●
●●
●●
●●
●●
●
●
●
●
●
●
●●●●
●
●●●●●
●
●
●
●●●●
●
●●
●
●
●
●
●●●
●
●●
●
●
●●●●●
●
●
●
●
●
●●
●
●●
● ●●●●●
●
●●
●●●●●
●●
●
●●●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●●●●
●
●
●●
●
●●
●●●●●●●●●●●●●●
●●
●
●●
●●●
●
●
●
●●
●●
●
●●●
●
●●●●
●
●
●
●
●●
●●
●●
●●●●●●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●●
●●●
●
●
●●
●
●●
●
●
●
●●
●
●●●
●
●●
●
●
●●●
●
●●●●●
●
●
●●●
●●●●●
●●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●●●●●●
●
●●●
●●
●
●●●
●
● ● ●
●●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●● ●●●
●
●
●● ●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●●●
● ●●
●
● ●●●●● ●● ●
●●
● ●● ● ●
●
●●
●●
●●
●
● ●● ●
●
●
●●
● ●
●
●●
●
●●
● ●
●
●
● ● ●●●● ●
●
●
●
●●
●
● ●●●●
●●
●●●
●●
●●
●
●
●
●●
●
●
●●●●
●●●
●
● ●●
●●
●
●
●
●●●
●
●
●●●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●●
●
● ●
●
●
●
●
●●
●●
●
●
● ●●●
●
●
●
●
●●
●
●
●
●● ●●
●
●●
●
●
● ● ● ●
●
● ●
●
●●
● ●●●●
●
●
●
●
● ●
●
●
● ●
●
●● ●
●
●
●
● ●●
●●
●●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
● ●●
●
●
●
●
●
● ●
●●
●
●
● ●
●
●
●
●
● ●
●
●●
●
●
●
● ●
●
●
●●●● ●
●
●
●●
●
●
● ●
●●
●
●●
●
●
●
●
●
●●●
●●●●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
● ●●●
●
●
● ●
●●● ●●
●
●
●
●●
●●
●
●●
●●
● ●●●●
● ●
●
●
● ●
● ● ●
●
● ●
●
●
●
●●
●● ●
●●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●● ●●
●● ●
●
●
●
● ●
●
●●
●
●
● ●
●●●●●
●● ●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●● ●●
●
●
● ●●
●
●
●●●
●
●●●●
●
●●
● ●
●
●
●
●●
● ●
●
●
●●●
●●●●●●
●●●●
●● ●●
●●●●
●●●
●●●
●
●
●
●
●● ●●
●
●●●
●● ● ●
●●●
●●●●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●●●
●●
●
●
●
● ●●
●
●
●●
●●
●
●
●●
●
●
●●
●
●●
●
●●●●●
●●
●●
●
●
● ●●●
●●
●
●
●●
●
●●
●●
●●●
●
●
●
●●
●
●
●● ● ●●
●●●●●●●●●●●●●●●●
● ●●
●●●
●●
●●●●
●
●
●
●
● ●●
● ●
●
●● ●●●●●
●
● ● ●
●
●● ●●
●
●●
●
●●
●
●
●●●
●●
●
●
●
●
●●●
●
●● ●● ●
●● ●
●
●
●●
●
●
●●●●
●●● ●
●●
●●●●●
●
●
●●●
●
●●
●
●●
●
●
●●●
●●
●●●
●
●
●
●
●
●
●
●●
●
●
●●●● ●●●
●●
●●
●● ●
●●
●
●●
●
●
●●●●●
● ●●
●
●
●●
●
●
●
●●●●
●
●●
●
●●●
●
●
●
●
●
●
●
●●●
●●
● ●
●
● ●●
● ●●●●●
●
●
●●
●
●●
●
●
●●
●
●
●
●●●●●●
●
●
●●●●
●●
● ●●●●● ●
●
●
●
●●
●●
●
●●
●
●
●
●
●●●●●
●
●
●
●●●●
●
●
●
●●●●●● ●
●●
●●
●●●
●●●
●●●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●●●●●
●●● ●
●●●
●
●
●
●
●
●
●●
●
●
●●●●● ●●● ●
●
●
●
●
● ●●●
●
●
●●
●
● ●●
●
●
●
● ●●
●
●
●
●
●
●●●
●
●
●● ● ● ●
●
●● ●
● ●●●
● ●
●
● ●
●
●
●
●
●●
● ● ● ●
●●
●
●
●
●●
●●
●
●●
●
●
●●
● ●
●
●
●
●
●
● ● ●
●
●
●
●
●●
●
●
●
●
●
● ●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
● ●●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●●●●
●
● ●
● ●
●
● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●● ●
●
●● ● ●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●● ●●
●
●
● ●
●
●●
●
●●
●●●
●
●
●
●● ●
●●
●●
●
● ●● ●
●
●
●
●
●●
●
●
●●
●●
● ●
●
●
●
●●
●
●
●
●
●●
●●●
●●
●●● ●●
●●
●●●
●●
●●
●
●
●
●
●●
●● ●● ●
●
●
●
●
●
●
●
●●
●
● ●●
●
●●
● ●
●
●●
●●
● ●●
●
●
●
●
● ●●
●
●
●
●
●
●●
●
●
●●
●●●●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●●
●
●
●
●
●
●
●●●
●● ●●●●●●
●●
●●●●●●●●
●
●
●
●
●
● ●
●●
●●
●
●●●●
●●
●●
●
●
●●
●
●
●
●
●
●●●●
●
●
●
● ● ●
●
●●
●●●●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●●●●●●
●●
●
●
●
●
●●
●
●
●●
●
●
●●●●
●
●●
●●
● ●
●
●
●
● ● ● ●
●●●
●
●
●
●
●
●
●
● ●
●
● ●●
● ●
●●
●
●
●
●●
●
●
●● ●
●
●●
●
●
●
●
●
●
●●●●●
●●
●● ●
●
●
●●
●
●
●
●
●●●●●●●●
●●●
●
●●●●
●●● ●
●
●●
●
●
●●●● ●●●●
●
● ●
●
●
● ●●●●●
●
●
●
●
●
● ●
●
● ●
●●●
●●●
●
●
●
●●
●●
● ●
●
● ●
●●
●
●●
●
●●
●
●
●
●
●
●
● ●●
●
● ●
●
●●●●
●●
●
●
●
●
●
●●● ●
●
●● ●●
●
● ●●●
●
●
●
●
●●
●
●
●●
● ●
●
●
● ●
●
●
●
●
● ●●●
●
●
●
●
●●
●
●
● ●●●●
●
●
●
●
●
● ●●
●
●
●
● ●
●
● ●
● ●●
●●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●● ●●
●
●
●
●
●● ●
●
●
●
●●
●
●●
●
●●●●
●●●
●
●
●
●●● ●
●
●
●
●●●
●
●
●
●
●●
●●
●
●●
●
● ●●●
●
●
●
●●
●● ●
●●
●
● ●
●
●●
●
●
●
●
●
● ●●
●●
●●●
●
●
●
●
●●●
●
●● ●●
●●
●● ●●
●
●●● ●●
●●● ●
●●●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●●●●
●
●
●
●●
●●●
●
●
●●●
●●
●●
●●●●●
●
●
●●●●
●
●
●●● ●● ●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●●
●
●
●
●
●●
●
●
●●
●
●●
● ●●●●● ●●● ●●●
●
●
●
●
●
● ●●
●
●
●
●
●
●●
●●
● ●●●●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●●
●
● ●
●
●
●
●● ●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●● ●
●
●
●●●●●●●
●●
●●●●
●●
●
●
●●
●●
●
●
●
●●
●
●●●
●
●
●
●●
●
● ●●●● ●●●●●
●●●●●
●●
●
●●●●
●
●
●●
●
●●●
●
●
●●● ●● ●
●
●● ●
●
●
●
●●
●●● ●●
●●
●● ●
●
●
●●
●
●
●
●●
●●
●
●
●
● ●
●
●
●
●●●●●
● ●
● ●
●
●
●●
●
●●
●
●
●
●
● ● ●●● ●
●
● ●● ●
●
●
●●
●
● ●
●●
●
●
● ●
●
●
●●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●●
●
●
●
●
● ●
●●
● ●
● ●
●
●
●
●
●
●● ●
● ●
●
●
●
●
●
●
●
●●●● ●●
●
●
●
●
●
● ●●
●
●
●
● ● ●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
● ●
●
● ●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●● ●
●
●
● ●
●
●
●●●
●
●●
●●
●
●
●●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
● ●
●
● ●
●
●●
●●
●●
●
●
●
●
● ●
●
●
●
●
● ●
● ●●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●●
●●
●
●
●
●
●
● ●
● ●
● ●
●
●●
●
●
● ●●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
● ●
●
● ●●
●
●
● ●
●
●
●
●●
● ●● ●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
● ●● ●●
●●
●
●
●
●
●
●●
●
●
●● ●
●
●
●
●
●
●●●
●
●●●
● ●
●●
●
●●●●
●
●
●
●
●●
●
●
● ●
● ●●
● ●● ●● ●
●
● ●
●
●
●
●
●
●
● ● ●
●
● ●
●
●
●
●
●● ●●
●
●
●
●
●
●
●
●
●
●
●
● ●●
● ●
●
● ●●
●
●
●●
●●
● ●
●
●
●
● ●
●
●
● ●●
●
● ●
●
● ● ●
●
● ●
●●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
● ●
● ● ●●●● ●
●
● ●●
●
●
● ●
●
●
●
● ●
●
●●
●
●
●
●
●
● ●● ●
●
●
● ●
●
●
●
● ●
●
●
●
●●
●
●●
●
● ●
●
● ●●
●
●
●
●
●
●
●
●●
● ●
●
●
●●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
● ●
●
●
●
●● ●
● ● ●
●●
●●●
●
●
●
● ●
●
●
●
●
● ●
●
●●
●
● ● ●
●
●
●
●
●
●
● ●
●
●
●● ●
●
●
●
● ●
●
●
●●
●
●
●
●● ●
●
●
●
●
● ●
●
●
● ●●
●
●
● ●
●
●● ● ●
●
● ●●
●● ●
●
● ●
●
●
●●
●●
●
●
● ●● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●●
●
●●
●
●
●
●●
●
●●
●
●
●
●
● ●
●
● ●●●
●
●
●
●
●
●● ●
●
●
●●
●
●●
●
●
●●● ●
●
●●●●
●●
●
●
●
●
●
●
● ●●●
●
●
●●● ●●
●
●
●
●
●●
●
●
● ●●
● ● ● ●
●
●
●●
●
●
●
●●
●
● ●
● ● ●●●● ●
●●
●
●●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
● ●
●●
● ●●
●
●●
●
●
●●
● ● ●
●
●
● ●●●
●●●
●● ●
●
● ●
●
●●
●
●●● ●
●
●
● ●
●
●
●
●
●
●●
●●
●●
●
●● ●● ●
●●
●
●
●●●
●
●
●
●
●
●●
rs10965250 stage 1
● r^2: 0.8 − 1.0
● r^2: 0.6 − 0.8
● r^2: 0.4 − 0.6
● r^2: 0.2 − 0.4
● r^2: 0.0 − 0.2
● r^2 missing
<− MLLT3
KIAA1797 −>
<− PTPLAD2
<− IFNB1
<− IFNW1
<− IFNA21
<− IFNA4
<− IFNA7
<− IFNA13
MTAP −>
<− CDKN2A
<− CDKN2B
DMRTA1 −>
<− ELAVL2
21 22 23 24
Position on chromosome 9 (Mb)
40
60
80
100
recombinationrate(c
CDC123/CAMK1D Region
4
6
8
10
log10(P−value)
40
60
80
100
recombinationrate(c
rs12779790
●●●
●
●
●●
●
rs12779790 stage 1
● r^2: 0.8 − 1.0
● r^2: 0.6 − 0.8
● r^2: 0.4 − 0.6
● r^2: 0.2 − 0.4
● r^2: 0.0 − 0.2
● r^2 missing
HHEX/IDE Region
10
15
log10(P−value)
40
60
80
100
recombinationrate(c
rs5015480
●
●
●
●
●
●●
●
●
●
●●●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●●
rs5015480 stage 1
● r^2: 0.8 − 1.0
● r^2: 0.6 − 0.8
● r^2: 0.4 − 0.6
● r^2: 0.2 − 0.4
● r^2: 0.0 − 0.2
● r^2 missing
.609
Not in a gene...In a gene...
~90% of GWAS hits are non-coding!
pporting!Figures!
!
!
~90% of GWAS hits are non-coding!
Stamatoyannopoulos, Science 2012
Systematic Localization of Common
Disease-Associated Variation in
Regulatory DNA
Matthew T. Maurano,1
* Richard Humbert,1
* Eric Rynes,1
* Robert E. Thurman,1
Eric Haugen,1
Hao Wang,1
Alex P. Reynolds,1
Richard Sandstrom,1
Hongzhu Qu,1,2
Jennifer Brody,3
Anthony Shafer,1
Fidencio Neri,1
Kristen Lee,1
Tanya Kutyavin,1
Sandra Stehling-Sun,1
Audra K. Johnson,1
Theresa K. Canfield,1
Erika Giste,1
Morgan Diegel,1
Daniel Bates,1
R. Scott Hansen,4
Shane Neph,1
Peter J. Sabo,1
Shelly Heimfeld,5
Antony Raubitschek,6
Steven Ziegler,6
Chris Cotsapas,7,8
Nona Sotoodehnia,3,9
Ian Glass,10
Shamil R. Sunyaev,11
Rajinder Kaul,4
John A. Stamatoyannopoulos1,12
†
Genome-wide association studies have identified many noncoding variants associated with common
diseases and traits. We show that these variants are concentrated in regulatory DNA marked by
deoxyribonuclease I (DNase I) hypersensitive sites (DHSs). Eighty-eight percent of such DHSs are active
during fetal development and are enriched in variants associated with gestational exposure–related
phenotypes. We identified distant gene targets for hundreds of variant-containing DHSs that may explain
phenotype associations. Disease-associated variants systematically perturb transcription factor recognition
sequences, frequently alter allelic chromatin states, and form regulatory networks. We also demonstrated
tissue-selective enrichment of more weakly disease-associated variants within DHSs and the de novo
identification of pathogenic cell types for Crohn’s disease, multiple sclerosis, and an electrocardiogram
trait, without prior knowledge of physiological mechanisms. Our results suggest pervasive involvement of
regulatory DNA variation in common human disease and provide pathogenic insights into diverse disorders.
D
isease- and trait-associated genetic variants
are rapidly being identified with genome-
wide association studies (GWAS) and re-
lated strategies (1). To date, hundreds of GWAS
have been conducted, spanning diverse diseases
and quantitative phenotypes (2) (fig. S1A). How-
ever, the majority (~93%) of disease- and trait-
associated variants emerging from these studies
lie within noncoding sequence (fig. S1B), com-
plicating their functional evaluation. Several lines
of evidence suggest the involvement of a propor-
tion of such variants in transcriptional regulatory
mechanisms, including modulation of promoter
and enhancer elements (3–6) and enrichment with-
in expression quantitative trait loci (eQTL) (3, 7, 8).
Human regulatory DNA encompasses a vari-
ety of cis-regulatory elements within which the co-
operative binding of transcription factors creates
focal alterations in chromatin structure. Deoxy-
ribonuclease I (DNase I) hypersensitive sites (DHSs)
are sensitive and precise markers of this actuated
regulatory DNA, and DNase I mapping has been
instrumental in the discovery and census of hu-
man cis-regulatory elements (9). We performed
DNase I mapping genome-wide (10) in 349 cell
and tissue samples, including 85 cell types studied
under the ENCODE Project (10) and 264 sam-
ples studied under the Roadmap Epigenomics
Program (11). These encompass several classes
nome. In total, we identified 3,899,693 distinct
DHS positions along the genome (collectively
spanning 42.2%), each of which was detected in
one or more cell or tissue types (median = 5).
Disease- and trait-associated variants are
concentrated in regulatory DNA. We examined
the distribution of 5654 noncoding genome-wide
significant associations [5134 unique single-
nucleotide polymorphisms (SNPs); fig. S1 and
table S2] for 207 diseases and 447 quantitative
traits (2) with the deep genome-scale maps of
regulatory DNA marked by DHSs. This revealed
a collective 40% enrichment of GWAS SNPs in
DHSs (fig. S1C, P < 10−55
, binomial, compared to
the distribution of HapMap SNPs). Fully 76.6%
of all noncoding GWAS SNPs either lie within a
DHS (57.1%, 2931 SNPs) or are in complete
linkage disequilibrium (LD) with SNPs in a near-
by DHS (19.5%, 999 SNPs) (Fig. 1A) (12). To con-
firm this enrichment, we sampled variants from
the 1000 Genomes Project (13) with the same ge-
nomic feature localization (intronic versus inter-
genic), distance from the nearest transcriptional
start site, and allele frequency in individuals of
European ancestry. We confirmed significant en-
richment both for SNPs within DHSs (P < 10−59
,
simulation) and also including variants in com-
plete LD (r 2
= 1) with SNPs in DHSs (P < 10−37
,
simulation) (fig. S2).
In total, 47.5% of GWAS SNPs fall within
gene bodies (fig. S1B); however, only 10.9% of
intronic GWAS SNPs within DHSs are in strong
LD (r2
≥ 0.8) with a coding SNP, indicating that
the vast majority of noncoding genic variants
are not simply tagging coding sequence. Analo-
gously, only 16.3% of GWAS variants within
coding sequences are in strong LD with variants in
DHSs. SNPs on widely used genotyping arrays
(e.g., Affymetrix) were modestly enriched with-
in DHSs (fig. S2), possibly due to selection of
SNPs with robust experimental performance in
genotyping assays. However, we found no evi-
dence for sequence composition bias (table S3).
To further examine the enrichment of GWAS
SNPs in regulatory DNA, we systematically clas-
sified all noncoding GWAS SNPs by the quality
1
Department of Genome Sciences, University of Washington,
Seattle, WA 98195, USA. 2
Laboratory of Disease Genomics
RESEARCH ARTICLE
onSeptember12,2012www.sciencemag.orgDownloadedfrom
There have been few, if any, similar bursts of discovery in the
history of medical research.
David Hunter and Peter Kraft, NEJM, 2007
Common claims discussed in regards to GWAS:
Despite issues, yielded many discoveries vs. cost
to a doubling of the number of associated variants discov-
ered. The proportion of genetic variation explained by
significantly associated SNPs is usually low (typically less
than 10%) for many complex traits, but for diseases such
as CD and multiple sclerosis (MS [MIM 126200]), and for
quantitative traits such as height and lipid traits, between
Figure 1. GWAS Discoveries over Time
Data obtained from the Published GWAS Catalog (see Web
Resources). Only the top SNPs representing loci with association
p values < 5 3 10À8
are included, and so that multiple counting
is avoided, SNPs identified for the same traits with LD r2
> 0.8 esti-
mated from the entire HapMap samples are excluded.
~500,000 SNP chips x ~$500/chip

= $250M
Five years of GWAS Discovery (Visscher, 2012)
$250M / ~2000 loci

= $125K/locus
Candidate genes: >$250M!
100 NIH R01s

Fighter jet

Hadron Collider: $9B
P = G + EType 2 Diabetes

Cancer

Alzheimer’s

Gene expression
Phenotype Genome
Variants
Environment
Infectious agents

Nutrients

Pollutants

Drugs
Complex traits are a function of genes and
environment...
Nothing comparable to elucidate E influence!
We lack high-throughput methods
and data to discover new E in P…
E: ???
A similar paradigm for discovery should exist

for E!
Why?
σ2
P = σ2
G + σ2
E
σ2
G
σ2
P
H2 =
Heritability (H2) is the range of phenotypic variability
attributed to genetic variability in a population
Indicator of the proportion of phenotypic
differences attributed to G.
Height is an example of a heritable trait:

Francis Galton shows how its done (1887)
“mid-height of 205 parents
described 60% of variability of 928
offspring”
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701

Contenu connexe

Tendances

Variant calling and how to prioritize somatic mutations and inheritated varia...
Variant calling and how to prioritize somatic mutations and inheritated varia...Variant calling and how to prioritize somatic mutations and inheritated varia...
Variant calling and how to prioritize somatic mutations and inheritated varia...Vall d'Hebron Institute of Research (VHIR)
 
RNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewRNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewSean Davis
 
Flash introduction to Qiime2 -- 16S Amplicon analysis
Flash introduction to Qiime2 -- 16S Amplicon analysisFlash introduction to Qiime2 -- 16S Amplicon analysis
Flash introduction to Qiime2 -- 16S Amplicon analysisAndrea Telatin
 
The STRING database and related tools
The STRING database and related toolsThe STRING database and related tools
The STRING database and related toolsLars Juhl Jensen
 
Transcriptomics and metabolomics
Transcriptomics and metabolomicsTranscriptomics and metabolomics
Transcriptomics and metabolomicsSukhjinder Singh
 
Secondary protein structure prediction
Secondary protein structure predictionSecondary protein structure prediction
Secondary protein structure predictionSiva Dharshini R
 
Bioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysisBioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysisDespoina Kalfakakou
 
The uni prot knowledgebase
The uni prot knowledgebaseThe uni prot knowledgebase
The uni prot knowledgebaseKew Sama
 

Tendances (20)

Introduction to 16S Microbiome Analysis
Introduction to 16S Microbiome AnalysisIntroduction to 16S Microbiome Analysis
Introduction to 16S Microbiome Analysis
 
Variant calling and how to prioritize somatic mutations and inheritated varia...
Variant calling and how to prioritize somatic mutations and inheritated varia...Variant calling and how to prioritize somatic mutations and inheritated varia...
Variant calling and how to prioritize somatic mutations and inheritated varia...
 
RNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewRNA-seq Data Analysis Overview
RNA-seq Data Analysis Overview
 
Flash introduction to Qiime2 -- 16S Amplicon analysis
Flash introduction to Qiime2 -- 16S Amplicon analysisFlash introduction to Qiime2 -- 16S Amplicon analysis
Flash introduction to Qiime2 -- 16S Amplicon analysis
 
The STRING database and related tools
The STRING database and related toolsThe STRING database and related tools
The STRING database and related tools
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Transcriptomics and metabolomics
Transcriptomics and metabolomicsTranscriptomics and metabolomics
Transcriptomics and metabolomics
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
 
Mega
MegaMega
Mega
 
Genomic Data Analysis
Genomic Data AnalysisGenomic Data Analysis
Genomic Data Analysis
 
Introduction to next generation sequencing
Introduction to next generation sequencingIntroduction to next generation sequencing
Introduction to next generation sequencing
 
Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
 
Secondary protein structure prediction
Secondary protein structure predictionSecondary protein structure prediction
Secondary protein structure prediction
 
Introduction to pdb
Introduction to pdbIntroduction to pdb
Introduction to pdb
 
Protein-protein interaction networks
Protein-protein interaction networksProtein-protein interaction networks
Protein-protein interaction networks
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Bioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysisBioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysis
 
The uni prot knowledgebase
The uni prot knowledgebaseThe uni prot knowledgebase
The uni prot knowledgebase
 
CytoScape
CytoScapeCytoScape
CytoScape
 
Whole Genome Analysis
Whole Genome AnalysisWhole Genome Analysis
Whole Genome Analysis
 

En vedette

Correlation globes of the exposome 2016
Correlation globes of the exposome 2016Correlation globes of the exposome 2016
Correlation globes of the exposome 2016Chirag Patel
 
Repurposing large datasets for exposomic discovery in disease
Repurposing large datasets for exposomic discovery in diseaseRepurposing large datasets for exposomic discovery in disease
Repurposing large datasets for exposomic discovery in diseaseChirag Patel
 
The path to implementation of Whole Genome Sequencing (WGS) in PulseNet
The path to implementation of Whole Genome Sequencing (WGS) in PulseNetThe path to implementation of Whole Genome Sequencing (WGS) in PulseNet
The path to implementation of Whole Genome Sequencing (WGS) in PulseNetExternalEvents
 
Data analytics to support exposome research course slides
Data analytics to support exposome research course slidesData analytics to support exposome research course slides
Data analytics to support exposome research course slidesChirag Patel
 
Informatics and data analytics to support for exposome-based discovery
Informatics and data analytics to support for exposome-based discoveryInformatics and data analytics to support for exposome-based discovery
Informatics and data analytics to support for exposome-based discoveryChirag Patel
 
Biomedical Informatics 706: Precision Medicine with exposures
Biomedical Informatics 706: Precision Medicine with exposuresBiomedical Informatics 706: Precision Medicine with exposures
Biomedical Informatics 706: Precision Medicine with exposuresChirag Patel
 
NSF Northeast Hub Big Data Workshop
NSF Northeast Hub Big Data WorkshopNSF Northeast Hub Big Data Workshop
NSF Northeast Hub Big Data WorkshopChirag Patel
 
Studying the elusive in larger scale
Studying the elusive in larger scaleStudying the elusive in larger scale
Studying the elusive in larger scaleChirag Patel
 
Big data exposome and pediatric outcomes
Big data exposome and pediatric outcomesBig data exposome and pediatric outcomes
Big data exposome and pediatric outcomesChirag Patel
 
a brief introduction to epistasis detection
a brief introduction to epistasis detectiona brief introduction to epistasis detection
a brief introduction to epistasis detectionHyun-hwan Jeong
 
Building a search engine for exposures in disease
Building a search engine for exposures in disease Building a search engine for exposures in disease
Building a search engine for exposures in disease Chirag Patel
 

En vedette (12)

Correlation globes of the exposome 2016
Correlation globes of the exposome 2016Correlation globes of the exposome 2016
Correlation globes of the exposome 2016
 
Repurposing large datasets for exposomic discovery in disease
Repurposing large datasets for exposomic discovery in diseaseRepurposing large datasets for exposomic discovery in disease
Repurposing large datasets for exposomic discovery in disease
 
The path to implementation of Whole Genome Sequencing (WGS) in PulseNet
The path to implementation of Whole Genome Sequencing (WGS) in PulseNetThe path to implementation of Whole Genome Sequencing (WGS) in PulseNet
The path to implementation of Whole Genome Sequencing (WGS) in PulseNet
 
Data analytics to support exposome research course slides
Data analytics to support exposome research course slidesData analytics to support exposome research course slides
Data analytics to support exposome research course slides
 
Informatics and data analytics to support for exposome-based discovery
Informatics and data analytics to support for exposome-based discoveryInformatics and data analytics to support for exposome-based discovery
Informatics and data analytics to support for exposome-based discovery
 
Biomedical Informatics 706: Precision Medicine with exposures
Biomedical Informatics 706: Precision Medicine with exposuresBiomedical Informatics 706: Precision Medicine with exposures
Biomedical Informatics 706: Precision Medicine with exposures
 
NSF Northeast Hub Big Data Workshop
NSF Northeast Hub Big Data WorkshopNSF Northeast Hub Big Data Workshop
NSF Northeast Hub Big Data Workshop
 
Studying the elusive in larger scale
Studying the elusive in larger scaleStudying the elusive in larger scale
Studying the elusive in larger scale
 
Big data exposome and pediatric outcomes
Big data exposome and pediatric outcomesBig data exposome and pediatric outcomes
Big data exposome and pediatric outcomes
 
a brief introduction to epistasis detection
a brief introduction to epistasis detectiona brief introduction to epistasis detection
a brief introduction to epistasis detection
 
Building a search engine for exposures in disease
Building a search engine for exposures in disease Building a search engine for exposures in disease
Building a search engine for exposures in disease
 
Lecture 7 gwas full
Lecture 7 gwas fullLecture 7 gwas full
Lecture 7 gwas full
 

Similaire à Intro to Biomedical Informatics 701

Repurposing large datasets to dissect exposomic (and genomic) contributions i...
Repurposing large datasets to dissect exposomic (and genomic) contributions i...Repurposing large datasets to dissect exposomic (and genomic) contributions i...
Repurposing large datasets to dissect exposomic (and genomic) contributions i...Chirag Patel
 
Big data and the exposome, Oregon State 040616
Big data and the exposome, Oregon State 040616Big data and the exposome, Oregon State 040616
Big data and the exposome, Oregon State 040616Chirag Patel
 
Mark Daly - Finding risk genes in psychiatric disorders
Mark Daly - Finding risk genes in psychiatric disordersMark Daly - Finding risk genes in psychiatric disorders
Mark Daly - Finding risk genes in psychiatric disorderswef
 
Japanese Environmental Children's Study and Data-driven E
Japanese Environmental Children's Study and Data-driven EJapanese Environmental Children's Study and Data-driven E
Japanese Environmental Children's Study and Data-driven EChirag Patel
 
Search engine for E NEU network science 080817
Search engine for E NEU network science 080817Search engine for E NEU network science 080817
Search engine for E NEU network science 080817Chirag Patel
 
Montgomery expression
Montgomery expressionMontgomery expression
Montgomery expressionmorenorossi
 
AACR 041616 digital exposomes
AACR 041616 digital exposomesAACR 041616 digital exposomes
AACR 041616 digital exposomesChirag Patel
 
Simulating Genes in Genome-wide Association Studies
Simulating Genes in Genome-wide Association StudiesSimulating Genes in Genome-wide Association Studies
Simulating Genes in Genome-wide Association StudiesKevin Thornton
 
Day2 145pm Crawford
Day2 145pm CrawfordDay2 145pm Crawford
Day2 145pm CrawfordSean Paul
 
헬스케어 빅데이터로 무엇을 할 수 있는가?
헬스케어 빅데이터로 무엇을 할 수 있는가?헬스케어 빅데이터로 무엇을 할 수 있는가?
헬스케어 빅데이터로 무엇을 할 수 있는가? Hyung Jin Choi
 
Genetics In Psychiatry
Genetics In PsychiatryGenetics In Psychiatry
Genetics In PsychiatryFrank Meissner
 
How to transform genomic big data into valuable clinical information
How to transform genomic big data into valuable clinical informationHow to transform genomic big data into valuable clinical information
How to transform genomic big data into valuable clinical informationJoaquin Dopazo
 
A New Generation Of Mechanism-Based Biomarkers For The Clinic
A New Generation Of Mechanism-Based Biomarkers For The ClinicA New Generation Of Mechanism-Based Biomarkers For The Clinic
A New Generation Of Mechanism-Based Biomarkers For The ClinicJoaquin Dopazo
 
10.1164@rccm.201701 0053 ed
10.1164@rccm.201701 0053 ed10.1164@rccm.201701 0053 ed
10.1164@rccm.201701 0053 edJulio A. Diaz M.
 

Similaire à Intro to Biomedical Informatics 701 (20)

Repurposing large datasets to dissect exposomic (and genomic) contributions i...
Repurposing large datasets to dissect exposomic (and genomic) contributions i...Repurposing large datasets to dissect exposomic (and genomic) contributions i...
Repurposing large datasets to dissect exposomic (and genomic) contributions i...
 
Big data and the exposome, Oregon State 040616
Big data and the exposome, Oregon State 040616Big data and the exposome, Oregon State 040616
Big data and the exposome, Oregon State 040616
 
Mark Daly - Finding risk genes in psychiatric disorders
Mark Daly - Finding risk genes in psychiatric disordersMark Daly - Finding risk genes in psychiatric disorders
Mark Daly - Finding risk genes in psychiatric disorders
 
Japanese Environmental Children's Study and Data-driven E
Japanese Environmental Children's Study and Data-driven EJapanese Environmental Children's Study and Data-driven E
Japanese Environmental Children's Study and Data-driven E
 
Search engine for E NEU network science 080817
Search engine for E NEU network science 080817Search engine for E NEU network science 080817
Search engine for E NEU network science 080817
 
Montgomery expression
Montgomery expressionMontgomery expression
Montgomery expression
 
Schizophrenia
SchizophreniaSchizophrenia
Schizophrenia
 
GWAS Study.pdf
GWAS Study.pdfGWAS Study.pdf
GWAS Study.pdf
 
AACR 041616 digital exposomes
AACR 041616 digital exposomesAACR 041616 digital exposomes
AACR 041616 digital exposomes
 
Simulating Genes in Genome-wide Association Studies
Simulating Genes in Genome-wide Association StudiesSimulating Genes in Genome-wide Association Studies
Simulating Genes in Genome-wide Association Studies
 
Day2 145pm Crawford
Day2 145pm CrawfordDay2 145pm Crawford
Day2 145pm Crawford
 
Genetics in Psychiatry
Genetics in PsychiatryGenetics in Psychiatry
Genetics in Psychiatry
 
헬스케어 빅데이터로 무엇을 할 수 있는가?
헬스케어 빅데이터로 무엇을 할 수 있는가?헬스케어 빅데이터로 무엇을 할 수 있는가?
헬스케어 빅데이터로 무엇을 할 수 있는가?
 
Genetics In Psychiatry
Genetics In PsychiatryGenetics In Psychiatry
Genetics In Psychiatry
 
Thesis On Psoriasis
Thesis On PsoriasisThesis On Psoriasis
Thesis On Psoriasis
 
How to transform genomic big data into valuable clinical information
How to transform genomic big data into valuable clinical informationHow to transform genomic big data into valuable clinical information
How to transform genomic big data into valuable clinical information
 
A New Generation Of Mechanism-Based Biomarkers For The Clinic
A New Generation Of Mechanism-Based Biomarkers For The ClinicA New Generation Of Mechanism-Based Biomarkers For The Clinic
A New Generation Of Mechanism-Based Biomarkers For The Clinic
 
10 Liu, Dajiang
10 Liu, Dajiang10 Liu, Dajiang
10 Liu, Dajiang
 
A genetic model for neurodevelopmental disease
A genetic model for neurodevelopmental diseaseA genetic model for neurodevelopmental disease
A genetic model for neurodevelopmental disease
 
10.1164@rccm.201701 0053 ed
10.1164@rccm.201701 0053 ed10.1164@rccm.201701 0053 ed
10.1164@rccm.201701 0053 ed
 

Plus de Chirag Patel

EWAS and the exposome: Mt Sinai in Brescia 052119
EWAS and the exposome: Mt Sinai in Brescia 052119EWAS and the exposome: Mt Sinai in Brescia 052119
EWAS and the exposome: Mt Sinai in Brescia 052119Chirag Patel
 
NCI systems epidemiology 03012019
NCI systems epidemiology 03012019NCI systems epidemiology 03012019
NCI systems epidemiology 03012019Chirag Patel
 
Chirag patel unite for sight 041418
Chirag patel unite for sight 041418Chirag patel unite for sight 041418
Chirag patel unite for sight 041418Chirag Patel
 
Bioinformatics Strategies for Exposome 100416
Bioinformatics Strategies for Exposome 100416Bioinformatics Strategies for Exposome 100416
Bioinformatics Strategies for Exposome 100416Chirag Patel
 
Methods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big dataMethods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big dataChirag Patel
 
Searching for predictors of male fecundity
Searching for predictors of male fecunditySearching for predictors of male fecundity
Searching for predictors of male fecundityChirag Patel
 

Plus de Chirag Patel (6)

EWAS and the exposome: Mt Sinai in Brescia 052119
EWAS and the exposome: Mt Sinai in Brescia 052119EWAS and the exposome: Mt Sinai in Brescia 052119
EWAS and the exposome: Mt Sinai in Brescia 052119
 
NCI systems epidemiology 03012019
NCI systems epidemiology 03012019NCI systems epidemiology 03012019
NCI systems epidemiology 03012019
 
Chirag patel unite for sight 041418
Chirag patel unite for sight 041418Chirag patel unite for sight 041418
Chirag patel unite for sight 041418
 
Bioinformatics Strategies for Exposome 100416
Bioinformatics Strategies for Exposome 100416Bioinformatics Strategies for Exposome 100416
Bioinformatics Strategies for Exposome 100416
 
Methods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big dataMethods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big data
 
Searching for predictors of male fecundity
Searching for predictors of male fecunditySearching for predictors of male fecundity
Searching for predictors of male fecundity
 

Dernier

Top Rated Bangalore Call Girls Mg Road ⟟ 9332606886 ⟟ Call Me For Genuine Se...
Top Rated Bangalore Call Girls Mg Road ⟟  9332606886 ⟟ Call Me For Genuine Se...Top Rated Bangalore Call Girls Mg Road ⟟  9332606886 ⟟ Call Me For Genuine Se...
Top Rated Bangalore Call Girls Mg Road ⟟ 9332606886 ⟟ Call Me For Genuine Se...narwatsonia7
 
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore EscortsVIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escortsaditipandeya
 
Top Rated Bangalore Call Girls Richmond Circle ⟟ 9332606886 ⟟ Call Me For Ge...
Top Rated Bangalore Call Girls Richmond Circle ⟟  9332606886 ⟟ Call Me For Ge...Top Rated Bangalore Call Girls Richmond Circle ⟟  9332606886 ⟟ Call Me For Ge...
Top Rated Bangalore Call Girls Richmond Circle ⟟ 9332606886 ⟟ Call Me For Ge...narwatsonia7
 
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...chandars293
 
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...astropune
 
Call Girls Haridwar Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Haridwar Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Haridwar Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Haridwar Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Bangalore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Bangalore Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Bangalore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Bangalore Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service AvailableCall Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...vidya singh
 
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...parulsinha
 
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any TimeTop Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any TimeCall Girls Delhi
 
Call Girls Tirupati Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Tirupati Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...
Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...
Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...Dipal Arora
 
Top Rated Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
Top Rated  Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...Top Rated  Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
Top Rated Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...chandars293
 
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...hotbabesbook
 
Top Rated Bangalore Call Girls Ramamurthy Nagar ⟟ 8250192130 ⟟ Call Me For Ge...
Top Rated Bangalore Call Girls Ramamurthy Nagar ⟟ 8250192130 ⟟ Call Me For Ge...Top Rated Bangalore Call Girls Ramamurthy Nagar ⟟ 8250192130 ⟟ Call Me For Ge...
Top Rated Bangalore Call Girls Ramamurthy Nagar ⟟ 8250192130 ⟟ Call Me For Ge...narwatsonia7
 
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...indiancallgirl4rent
 
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort ServicePremium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Servicevidya singh
 

Dernier (20)

Top Rated Bangalore Call Girls Mg Road ⟟ 9332606886 ⟟ Call Me For Genuine Se...
Top Rated Bangalore Call Girls Mg Road ⟟  9332606886 ⟟ Call Me For Genuine Se...Top Rated Bangalore Call Girls Mg Road ⟟  9332606886 ⟟ Call Me For Genuine Se...
Top Rated Bangalore Call Girls Mg Road ⟟ 9332606886 ⟟ Call Me For Genuine Se...
 
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore EscortsVIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escorts
 
Top Rated Bangalore Call Girls Richmond Circle ⟟ 9332606886 ⟟ Call Me For Ge...
Top Rated Bangalore Call Girls Richmond Circle ⟟  9332606886 ⟟ Call Me For Ge...Top Rated Bangalore Call Girls Richmond Circle ⟟  9332606886 ⟟ Call Me For Ge...
Top Rated Bangalore Call Girls Richmond Circle ⟟ 9332606886 ⟟ Call Me For Ge...
 
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
 
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...
 
Call Girls Haridwar Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Haridwar Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Haridwar Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Haridwar Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girls Bangalore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Bangalore Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Bangalore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Bangalore Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service AvailableCall Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
 
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
 
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
 
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...
 
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any TimeTop Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
 
Call Girls Tirupati Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Tirupati Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...
Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...
Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...
 
Top Rated Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
Top Rated  Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...Top Rated  Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
Top Rated Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
 
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
 
Top Rated Bangalore Call Girls Ramamurthy Nagar ⟟ 8250192130 ⟟ Call Me For Ge...
Top Rated Bangalore Call Girls Ramamurthy Nagar ⟟ 8250192130 ⟟ Call Me For Ge...Top Rated Bangalore Call Girls Ramamurthy Nagar ⟟ 8250192130 ⟟ Call Me For Ge...
Top Rated Bangalore Call Girls Ramamurthy Nagar ⟟ 8250192130 ⟟ Call Me For Ge...
 
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...
 
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort ServicePremium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
 

Intro to Biomedical Informatics 701

  • 1. Bioinformatics for discovery: Introduction to GWAS and EWAS BMI 701:Introduction to Biomedical Informatics 12/1/2015 chirag@hms.harvard.edu @chiragjp www.chiragjpgroup.org Chirag J Patel
  • 2. P = G + EType 2 Diabetes Cancer Alzheimer’s Gene expression Phenotype Genome Variants Environment Infectious agents Nutrients Pollutants Drugs Complex traits are a function of genes and environment...
  • 3. We are great at G investigation! over 2000 Genome-wide Association Studies (GWAS) https://www.ebi.ac.uk/gwas/ G
  • 4. >2,000 traits/diseases >15,000 SNPs >16,000 SNP-trait associations https://www.ebi.ac.uk/gwas/
  • 5. Dissecting G in P: What is a Genome-wide Association Study? Hypothesis-free “search engine” for genetic variants associated with a complex trait or disease in unrelated populations SNP(A) SNP(a) diseased non- diseased SNP(A) SNP(a) diseased non- diseased SNP(A) SNP(a) diseased non- diseased SNP(A) SNP(a) diseased non- diseased SNP(A) SNP(a) diseased non- diseased SNP(A) SNP(a) diseased non- diseased SNP(A) SNP(a) diseased non- diseased SNP(A) SNP(a) diseased non- diseased SNP(A) SNP(a) diseased non- diseased SNP(Z) SNP(z) diseased non- diseasedgenome-wide
  • 6. The road to GWAS...
  • 7. A new paradigm of GWAS for discovery of G in P: Human Genome Project to GWAS Sequencing of the genome 2001 HapMap project: http://hapmap.ncbi.nlm.nih.gov/ Characterize common variation 2001-current day High-throughput variant assay < $99 for ~1M variants Measurement tools ~2003 (ongoing) ARTICLES Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls The Wellcome Trust Case Control Consortium* There is increasing evidence that genome-wide association (GWA) studies represent a powerful approach to the identification of genes involved in common human diseases. We describe a joint GWA study (using the Affymetrix GeneChip 500K Mapping Array Set) undertaken in the British population, which has examined ,2,000 individuals for each of 7 major diseases and a shared set of ,3,000 controls. Case-control comparisons identified 24 independent association signals at P , 5 3 1027 : 1 in bipolar disorder, 1 in coronary artery disease, 9 in Crohn’s disease, 3 in rheumatoid arthritis, 7 in type 1 diabetes and 3 in type 2 diabetes. On the basis of prior findings and replication studies thus-far completed, almost all of these signals reflect genuine susceptibility effects. We observed association at many previously identified loci, and found compelling evidence that some loci confer risk for more than one of the diseases studied. Across all diseases, we identified a 25 27 Vol 447|7 June 2007|doi:10.1038/nature05911 Nature 2008 Comprehensive, high-throughput analyses GWAS
  • 8. Number of raw publications with subject of “GWAS” 0 1000 2000 3000 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 Year NumberofPublications'GWAS' pubmed MeSH terms: human + GWAS
  • 9. Number of raw publications with subject of “GWAS” 0 1000 2000 3000 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 Year NumberofPublications'GWAS' pubmed MeSH terms: human + GWAS Risch + Merikangas linkage vs. association human genome sequenced GWAS age-related macular degeneration mega-meta-GWAS WTCCC GWAS is relevant today (even with NGS) around the corner
  • 11. Geneticists have made substantial progress in identifying the genetic basis of many human diseases, at least those with conspicuous deter- minants.ThesesuccessesincludeHuntington's disease, Alzheimer's disease, and some forms of breast cancer. However, the detection of ge- netic factors for complex diseases-such as schizophrenia, bipolardisorder, anddiabetes- has been far more complicated. There have been numerous reports of genes or loci that might underlie these disorders, butfew ofthese findings have been replicated. The modest na- ture ofthe gene effectsforthese disorders likely explains the contradictory and inconclusive claims about their identification. Despite the small effects of such genes, the magnitude of theirattributable risk (theproportion ofpeople affectedduetothem) maybelargebecause they are quite frequent in the population, making them ofpublic health significance. Has the genetic study ofcomplex disorders reached its limits? The persistent lack of replicability of these reports of linkage be- tween various loci and complex diseases might imply that it has. We argue below that age analysis we have chosen for this argu- ment is a popular current paradigm in which pairs of siblings, both with the disease, are examined for sharing of alleles at multiple sites in the genome defined by genetic mark- ers. The more often the affected siblings share the same allele at a particular site, the more likely the site is close to the disease gene. Using the formulas in (1), we calculate the expected proportion Yofalleles shared by a pair ofaffected siblings for the best possible case-that is, a closely linked marker locus (recombination fraction 0 = 0) that is fully informative (heterozygosity = 1) (2)-as 1 +W wherew= pq(y-1)2 2+w (py+q)2 If there is no linkage of a marker at a particular site to the disease, the siblings would be expected to share alleles 50% ofthe time; that is, Y would equal 0.5. Values of Y for various values ofp and y are given in the third column of the table. For an allele of moderate frequency (p is 0.1 to 0.5) that con- linkage analysis for about 2 or less will ne because the numbe (more than -2500) able. Although testsof est effect are of low above example, direc a disease locus itself To illustrate this poi sion/disequilibrium t In this test, transmis at a locus from heter affected offspring is e lian inheritance, all a chance ofbeing tran eration. In contrast, associated with dise mitted more often th For this approach, with multiple affect just on single affect parents. For the same can calculate the pr parents as pq(y + 1 the probability for a transmit the high ris Association tests ca pairs of affected sibl associatedwithdiseas over 50% is the same the probability ofpar creased at lowvalues the probability ofpar creased. The formula The Future of Genetic Studies of Complex Human Diseases Neil Risch and Kathleen Merikangas onimm, 0In"a0,"a, Geneticists have made substantial progress in identifying the genetic basis of many human diseases, at least those with conspicuous deter- minants.ThesesuccessesincludeHuntington's disease, Alzheimer's disease, and some forms of breast cancer. However, the detection of ge- netic factors for complex diseases-such as schizophrenia, bipolardisorder, anddiabetes- has been far more complicated. There have been numerous reports of genes or loci that might underlie these disorders, butfew ofthese findings have been replicated. The modest na- ture ofthe gene effectsforthese disorders likely explains the contradictory and inconclusive claims about their identification. Despite the small effects of such genes, the magnitude of theirattributable risk (theproportion ofpeople affectedduetothem) maybelargebecause they are quite frequent in the population, making them ofpublic health significance. Has the genetic study ofcomplex disorders reached its limits? The persistent lack of replicability of these reports of linkage be- tween various loci and complex diseases might imply that it has. We argue below that age analysis we have chosen for this ar ment is a popular current paradigm in whi pairs of siblings, both with the disease, examined for sharing of alleles at multip sites in the genome defined by genetic mar ers. The more often the affected sibli share the same allele at a particular site, t more likely the site is close to the dise gene. Using the formulas in (1), we calcul the expected proportion Yofalleles shared a pair ofaffected siblings for the best possi case-that is, a closely linked marker lo (recombination fraction 0 = 0) that is fu informative (heterozygosity = 1) (2)-as 1 +W wherew= pq(y-1)2 2+w (py+q)2 If there is no linkage of a marker at particular site to the disease, the sibli would be expected to share alleles 50% oft time; that is, Y would equal 0.5. Values o for various values ofp and y are given in t third column of the table. For an allele moderate frequency (p is 0.1 to 0.5) that co The Future of Genetic Studies of Complex Human Diseases Neil Risch and Kathleen Merikangas Science, 1996 A new paradigm is needed for discovery!
  • 12. How does a GWAS work?
  • 13. Single nucleotide polymorphisms (SNPs): How many SNPs are in the human genome? >3,000,000,000 bases in human genome SNPs appear ~1000 bases ~3,000,000 SNPs 40-60% have minor allele frequency <5% GWAS focus on frequency >5% HapMap Consortium, 2010
  • 14. Can’t measure everything: Tag SNPs and Linkage Disequilibrium (LD) LD = co-occurance of SNPs in a contiguous region Bush and Moore, 2012
  • 15. The phenomenon of LD makes GWAS possible: How and why?: Indirect association additional studies to map the precise location of the influential SNP. Conceptually, the end result of GWAS under the common disease/common var- needed to capture the variation African genome. It is important to note that t ogy for measuring genomic Figure 3. Indirect Association. Genotyped SNPs often lie in a region of high linka will be statistically associated with disease as a surrogate for the disease SNP throu doi:10.1371/journal.pcbi.1002822.g003 Bush and Moore, 2012 LD blocks
  • 16. Can’t measure everything: Tag SNPs and Linkage Disequilibrium Tag SNPs are common proxies for other SNPs 500K - 1M per chip tified significant associations for seven SNPs representing four new T2DM loci (Table 1). In all cases, the strongest association for the MAX statistic (see Methods) was obtained with the additive model. of this gene (Fig. 2a) solely in the secretory final stages of insulin * * * 0 2 4 –log10[P] –log10[P] * 4954642sr 2373971sr 3373971sr 445409sr 8012261sr 3349941sr 883429sr 2019462sr 0349941sr 90350501sr 036169sr 0415007sr 2225991sr 6136642sr 8136642sr 1869646sr 8798751sr 04928201sr 3926642sr 5926642sr 43666231sr 9926642sr 2954642sr 01350501sr 5769646sr 4577187sr 4769646sr 41350501sr 5784931sr 2173387sr 39250501sr 5050007sr 7492602sr 1255051sr 156868sr 4373387sr 4784931sr 7501107sr 2697402sr 91518711sr 6461001sr 29250501sr 5889103sr 8669646sr 0889103sr 4688392sr SLC30A8 IDE 0 2 4 7912381sr 3148707sr 0283856sr 52078111sr 5227373sr 0491242sr 2369412sr 2297881sr 662155sr 7790197sr 44068701sr 35075221sr 5826807sr 7851092sr 9409522sr –log10[P] –log10[P] EXT2 ALX4 0 2 4 *** * 0 2 4 a b c d LD block 2 alleles are correlated because they are inherited together Sladek et al, 2007
  • 17. image: www.lifa-core.de/ Digitizing SNPs: e.g., Illumina Infinium Array image: illumina.com
  • 18. Assessing Thousands of Factors Simultaneously: Data-driven search for differences in SNP frequencies ~100,000 - ~1,000,000 association tests disease cases healthy controls GCAGGTACATG...GGTA... GCAGGTACACG...GGTA... GCAGGTACATG...GGTA... GCAGGTACACG...GGTA... GCAGGTACATG...GGTA... GCAGGTACACG...GGTA... disease cases GCAGGTACATG...GGTA... GCAGGTACATG...GGTA... GCAGGTACATG...GGTA... GCAGGTACATG...GGTA... healthy controls
  • 19. Associating One SNP with Disease Case-Control Study Design DiseaseSNP (A/a) ? A a diseased non- diseased cases controls
  • 20. Associating One SNP with Disease What is an “Odds Ratio”? DiseaseSNP (A/a) ? A a diseased c d non- diseased x y cases controls Chi-squared test Odds Ratio a vs A: Odds of disease with allele a vs. Odds of disease with allele A 1: equal odds (no difference) >1: increased odds (increased risk) <1: decreased odds (decreased risk)
  • 21. Associating One SNP with Disease Calculating the Odds Ratio DiseaseSNP (A/a) ? A a diseased c d non- diseased x y cases controls Chi-squared test Odds Ratio dx cy y/x d/c [d/(d+y)]/[y/(d+y)] Odds Ratio a vs A: [c/(x+y)]/[x/(c+x)] Odds with allele a Odds with allele A How would you interpret an OR of 2?
  • 22. Associating One SNP with Disease Cohort Study Design DiseaseSNP (A/a) ? •Direct measure of risk vs. odds ratio •Need to wait! •If incidence is low, N needs to be large! Non-diseasedSNP (A/a) vs. Cox survival regression Relative Risk
  • 23. Models to associate genotypes with disease Examples for a case-control study Aa AA AA aa Aa AaaaAa Disease Non-diseased ND=4 NC=4
  • 24. Models to associate genotypes with disease Examples for a case-control study Aa AA AA aa Aa AaaaAa Disease Non-diseased ND=4 NC=4 A a diseased non- diseased 6 2 2 6 OR A (vs a) OR a (vs A)
  • 25. AA Aa aa diseased non- diseased Models to associate genotypes with disease Genotypic Test (“2 or 1 df test”) Aa AA AA aa Aa AaaaAa Diseased Non-diseased ND=4 NC=4 2 OR AA (vs. Aa) aa (vs. Aa) 2 0 220
  • 26. Associating One SNP with Quantitative Trait (e.g., height, weight, cholesterol) 40 60 80 100 1 2 3 factor(SNP) trait GG GC CC height SNP rs1234 SNP rs123456 25 50 75 100 125 1 2 3 factor(SNP) trait height CC CT TT
  • 27. Associating One SNP with Quantitative Trait Linear Regression and Additive Risk Model y=ɑ+βx+ε 25 50 75 100 125 1 2 3 factor(SNP) trait height CC (0) CT (1) TT (2) SNP rs123456 height = ɑ+βx xCC=0 if individual is CC xCT=1 if individual is CT xTT=2 if individual is TT ɑ β: change in height for 1 risk allele T= risk allele β
  • 28. Prototypical “Manhattan plot” to visualize associations Science, 2007 ~100,000 - ~1,000,000 association tests evol part ease tase well biol T capt imp STR reve subs libri clea −log10(P) 0 5 10 15 Chromosome 22 X 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 80 60 40 100 rvedteststatistic a b NATURE|Vol 447|7 June 2007 AA Aa aa diseased non- diseased
  • 29. ibility with schizophrenia, a psychotic disorder with many similar- ities to BD. In particular association findings have been reported with assium channel. Ion channelopathies are well-recognized as causes of episodic central nervous system disease, including seizures, ataxias −log10 (P) 0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15 Chromosome Type 2 diabetes 22 XX 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 22 XX 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 22 XX 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 22 XX 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 22 XX 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 22 XX 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 22 XX 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Coronary artery disease Crohn’s disease Hypertension Rheumatoid arthritis Type 1 diabetes Bipolar disorder Figure 4 | Genome-wide scan for seven diseases. For each of seven diseases 2log10 of the trend test P value for quality-control-positive SNPs, excluding Chromosomes are shown in alternating colours for clarity, with P values ,1 3 1025 highlighted in green. All panels are truncated at
  • 30. Type I Error: False Positives! what is a p-value? chance we attain the observed result if no difference (H0) Many tests: some can be significant (low p-value by chance)! 100 tests at a p-value of 0.05... how many would be significant per chance? Bonferroni “correction”: Correct the 0.05 significance level by number of tests e.g., 1M SNPs: 0.05/1x10-6 = 5x10-8
  • 31. QQplot: Distribution of of observed p-values vs. Ho p- values Histogram of runif(10000) runif(10000) Frequency 0.0 0.2 0.4 0.6 0.8 1.0 0100200300400500 p-values under Ho Histogram of gwas$P.value gwas$P.value Frequency 0.0 0.2 0.4 0.6 0.8 1.0 050000100000150000 p-values of GWAS in Total Cholesterol Global Lipids Consortium, 2012random uniform distribution
  • 32. QQplot: Distribution of of observed p-values vs. Ho p- values Histogram of gwas$P.value gwas$P.value Frequency 0.0 0.2 0.4 0.6 0.8 1.0 050000100000150000 p-values of GWAS in Total Cholesterol
  • 33. Which diseases show evidence of association? Examining the QQplot of test statistics in WTCCC sent study cannot provideconclusive exclusion of any given gene. This is the consequence of several factors including: less-than-complete coverage of common variation genome-wide on the Affymetrix chip; poor coverage (by design) of rare variants, including many structural variants (thereby reducing power to detect rare, penetrant, alleles)25 ; difficultieswithdefining thefullgenomicextentofthegene ofinterest; and, despite the sample size, relatively low power to detect, at levels of already allow us, for selected diseases, to highlight pathways and mechanisms of particular interest. Naturally, extensive resequencing and fine-mapping work, followed by functional studies will be required before such inferences can be translated into robust state- ments about the molecular and physiological mechanisms involved. We turn now to a discussion of the main findings for each disease, focusing here only on the most significant and interesting results 25 20 20 15 15 10 10 5 5 30 0 0 25 20 20 15 15 10 10 5 5 30 0 0 25 20 20 15 15 10 10 5 5 30 0 0 25 20 20 15 15 10 10 5 5 30 0 0 25 20 20 15 15 10 10 5 5 30 0 0 25 20 20 15 15 10 10 5 5 30 0 0 25 20 20 15 15 10 10 5 5 30 0 0 BD Observedteststatistic Expected chi-squared value CAD CD HT RA T2D T1D Figure 3 | Quantile-quantile plots for seven genome-wide scans. For each of the seven disease collections, a quantile-quantile plot of the results of the trend test is shown in black for all SNPs that pass the standard project filters, have a minor allele frequency .1% and missing data rate ,1%. SNPs that 360,000 SNPs. SNPs at which the test statistic exceeds 30 are represented by triangles. Additional quantile-quantile plots, which also exclude all SNPs located in the regions of association listed in Table 3, are superimposed in blue (for BD, the exclusion of these SNPs has no visible effect on the plot, and
  • 34. Observational associations do not equal causation...
  • 35. Ice Cream $ Drowning Confounding bias What is a confounder? Summer! ? Confounder is correlated to both the “risk” factor and disease, leading to invalid inference. Common source of bias in observational studies (e.g., case-control, cohort, etc)
  • 36. SNP Disease Population Stratification: A source of possible confounding in GWAS race/ethnicity ? Ancestry correlated with allele frequency and disease GWAS are done on specific populations separately. (most have been done in populations of European ancestry)
  • 37. FTO Diabetes Mediation SNPs indicative of a mediator factor? Example: FTO and Type 2 Diabetes Body Mass ? Association between FTO and Type 2 Diabetes via BMI? ... or does FTO have a independent role in Type 2 Diabetes...? FTO Body Mass
  • 38. PLINK: (Standard) Whole Genome Analysis Software
  • 39. PLINK: (Standard) Whole Genome Analysis Software http://pngu.mgh.harvard.edu/~purcell/plink/ •cited >9000 times since 2007 •allele frequency •linkage disequilibrium (LD) •data manipulation/filtering •association: allelic, genotypic models •chi-square •logistic •linear
  • 40. Examples: GWASs in Type 2 Diabetes
  • 41. Type 2 Diabetes Mellitus: A complex, multifactorial disease •Insulin production vs. use •beta-cell function •insulin sensitivity (BMI) •Moves glucose from blood into cells •Complications arise due to glucose in blood, hyperglycemia •diagnosed by blood glucose levels CDC, family history: 25% body weight, diet, lifestyle, age
  • 42. ARTICLES A genome-wide association study identifies novel risk loci for type 2 diabetes Robert Sladek1,2,4 , Ghislain Rocheleau1 *, Johan Rung4 *, Christian Dina5 *, Lishuang Shen1 , David Serre1 , Philippe Boutin5 , Daniel Vincent4 , Alexandre Belisle4 , Samy Hadjadj6 , Beverley Balkau7 , Barbara Heude7 , Guillaume Charpentier8 , Thomas J. Hudson4,9 , Alexandre Montpetit4 , Alexey V. Pshezhetsky10 , Marc Prentki10,11 , Barry I. Posner2,12 , David J. Balding13 , David Meyre5 , Constantin Polychronakos1,3 & Philippe Froguel5,14 Type 2 diabetes mellitus results from the interaction of environmental factors with a combination of genetic variants, most of which were hitherto unknown. A systematic search for these variants was recently made possible by the development of high-density arrays that permit the genotyping of hundreds of thousands of polymorphisms. We tested 392,935 single-nucleotide polymorphisms in a French case–control cohort. Markers with the most significant difference in genotype frequencies between cases of type 2 diabetes and controls were fast-tracked for testing in a second cohort. This identified four loci containing variants that confer type 2 diabetes risk, in addition to confirming the known association with the TCF7L2 gene. These loci include a non-synonymous polymorphism in the zinc transporter SLC30A8, which is expressed exclusively in insulin-producing b-cells, and two linkage disequilibrium blocks that contain genes potentially involved in b-cell development or function (IDE–KIF11–HHEX and EXT2–ALX4). These associations explain a substantial portion of disease risk and constitute proof of principle for the genome-wide approach to the elucidation of complex genetic traits. The rapidly increasing prevalence of type 2 diabetes mellitus (T2DM) is thought to be due to environmental factors, such as increased availabil- ity of food and decreased opportunity and motivation for physical activity, acting on genetically susceptible individuals. The heritability of T2DM is one of the best established among common diseases and, consequently, genetic risk factors for T2DM have been the subject of intense research1 . Although the genetic causes of many monogenic forms of diabetes (maturity onset diabetes in the young, neonatal mito- chondrial and other syndromic types of diabetes mellitus) have been elucidated, few variants leading to common T2DM have been clearly identified and individually confer only a small risk (odds ratio < 1.1– 1.25) of developing T2DM1 . Linkage studies have reported many T2DM-linked chromosomal regions and have identified putative, cau- sative genetic variants in CAPN10 (ref. 2), ENPP1 (ref. 3), HNF4A (refs 4, 5) and ACDC (also called ADIPOQ)6 . In parallel, candidate-gene studieshavereportedmanyT2DM-associatedloci,withcodingvariants in the nuclear receptor PPARG (P12A)7 and the potassium channel KCNJ11 (E23K)8 being among the very few that havebeen convincingly replicated. The strongest known (odds ratio < 1.7) T2DM association9 was recently mapped to the transcription factor TCF7L2 and has been consistently replicated in multiple populations10–20 . Subjects and study design The recent availability of high-density genotyping arrays, which com- bine the power of association studies with the systematic nature of a genome-wide search, led us to undertake a two-stage, genome-wide association study to identify additional T2DM susceptibility loci (Supplementary Fig. 1). In the first stage of this study, we obtained genotypes for 392,935 single-nucleotide polymorphisms (SNPs) in 1,363 T2DM cases and controls (Supplementary Table 1). In order to enrich for risk alleles21 , the diabetic subjects studied in stage 1 were selected to have at least one affected first degree relative and age at onset under 45 yr (excluding patients with maturity onset diabetes in the young). Furthermore, in order to decrease phenotypic hetero- geneity and to enrich for variants determining insulin resistance and b-cell dysfunction through mechanisms other than severe obesity, we initially studied diabetic patients with a body mass index (BMI) ,30 kg m22 . Control subjects were selected to have fasting blood glucose ,5.7 mmol l21 in DESIR, a large prospective cohort for the study of insulin resistance in French subjects22 . Genotypes for each study subject were obtained using two plat- forms: Illumina Infinium Human1 BeadArrays, which assay 109,365 SNPs chosen using a gene-centred design; and Human Hap300 BeadArrays, which assay 317,503 SNPs chosen to tag haplotype blocks identified by the Phase I HapMap23 . Of the 409,927 markers that passed quality control (Supplementary Tables 2 and 3), geno- types were obtained for an average of 99.2% (Human1) and 99.4% (Hap300) of markers for each subject with a reproducibility of .99.9% (both platforms). Forty-three subjects were removed from analysis because of evidence of intercontinental admixture (Sup- plementary Fig. 3) and an additional four because their genotype- determined gender disagreed with clinical records. In total, T2DM association was tested for 100,764 (Human1) and 309,163 (Hap300) SNPs representing 392,935 unique loci (Fig. 1). Because of unequal male/female ratios in our cases and controls, we analysed the 12,666 sex-chromosome SNPs separately for each gender. *These authors contributed equally to this work. 1 Departments of Human Genetics, 2 Medicine and 3 Pediatrics, Faculty of Medicine, McGill University, Montreal H3H 1P3, Canada. 4 McGill University and Genome Quebec Innovation Centre, Montreal H3A 1A4, Canada. 5 CNRS 8090-Institute of Biology, Pasteur Institute, Lille 59019 Cedex, France. 6 Endocrinology and Diabetology, University Hospital, Poitiers 86021 Cedex, France. 7 INSERM U780-IFR69, Villejuif 94807, France. 8 Endocrinology-Diabetology Unit, Corbeil-Essonnes Hospital, Corbeil-Essonnes 91100, France. 9 Ontario Institute for Cancer Research, Toronto M5G 1L7, Canada. 10 Montreal Diabetes Research Center, Montreal H2L 4M1, Canada. 11 Molecular Nutrition Unit and the Department of Nutrition, University of Montreal and the Centre Hospitalier de l’Universite´ de Montre´al, Montreal H3C 3J7, Canada. 12 Polypeptide Hormone Laboratory and Department of Anatomy and Cell Biology, Montreal H3A 2B2, Canada. 13 Department of Epidemiology & Public Health, Imperial College, St Mary’s Campus, Norfolk Place, London W2 1PG, UK. 14 Section of Genomic Medicine, Imperial College London W12 0NN, and Hammersmith Hospital, Du Cane Road, London W12 0HS, UK. 881 Nature©2007 Publishing Group Nature, 2/2007 References and Notes 1. B. G. Richmond, D. S. Strait, Nature 404, 382 (2000). 2. J. Kingdon, Lowly Origins (Princeton Univ. Press, Princeton, NJ, 2003). 3. C. V. Ward, M. G. Leakey, A. Walker, Evol. Anthropol. 7, 197 (1999). 4. Y. Haile-Selassie, Nature 412, 178 (2001). 5. T. D. White et al., Nature 440, 883 (2006). 6. K. Kovarovic, P. Andrews, J. Hum. Evol., in press (available at http://dx.doi.org./doi:10.1016/j.jhevol.2007.01.001; doi: 10.1016/j.jhevol.2007.01.001). 7. N. Patterson, D. J. Richter, S. Gnerre, E. S. Lander, D. Reich, Nature 441, 1103 (2006). 8. K. D. Hunt et al., Primates 37, 363 (1996). 9. J. G. Fleagle et al., Symp. Zool. Soc. London 48, 359 (1981). 10. R. H. Crompton et al., Cour. Forsch-Inst. Senckenb. 243, 115 (2003). 11. J. T. Stern, Yrb. Phys. Anthropol. 19, 59 (1975). 12. S. K. S. Thorpe, R. H. Crompton, Am. J. Phys. Anthropol. 131, 384 (2006). 13. K. D. Hunt, J. Hum. Evol. 26, 183 (1994). 15. E. Larney, S. Larsen, Am. J. Phys. Anthropol. 125, 42 (2004). 16. S. K. S. Thorpe, R. H. Crompton, Am. J. Phys. Anthropol. 127, 58 (2005). 17. S. K. S. Thorpe, R. H. Crompton, M. M. Gunther, R. F. Ker, R. McN. Alexander, Am. J. Phys. Anthropol. 110, 179 (1999). 18. R. McN. Alexander, Principles of Animal Locomotion (Princeton Univ. Press, Princeton, NJ, 2003). 19. C. V. Ward, Yrbk. Phys. Anthropol. 45, 185 (2002). 20. R. W. Wrangham, N. L. Conklin-Brittain, K. D. Hunt, Int. J. Primatol. 19, 949 (1998). 21. H. Pontzer, R. W. Wrangham, J. Hum. Evol. 46, 317 (2004). 22. R. C. Payne et al., J. Anat. 208, 709 (2006). 23. M. Pickford, B. Senut, B. Gommery, in Late Cenozoic Environments and Hominid Evolution: a Tribute to Bill Bishop, P. Andrews, P. Banham, Eds. (Geological Society, London, 1999), pp. 27–38. 24. N. M. Young, L. MacLatchy, J. Hum. Evol. 46, 163 (2004). 25. D. Gommery, B. Senu, M. Pickford, E. Musiime, Ann. Paléontol. 88, 167 (2002). 26. C. V. Ward, in Handbook of Paleoanthropology Vol. 2: Primate Evolution and Human Origins, W. Henke, I. Tattersall, Eds. (Springer, Heidelberg, Germany, 2007), pp. 1011–1030. N. Ogihara, M. Nakatsukasa, Eds. (Springer, Heidelberg, Germany, 2006), pp. 199–208. 28. C. P. E. Zollikofer et al., Nature 434, 755 (2005). 29. M. Pickford, Anthropologie 69, 191 (2005). 30. We thank the Indonesian Institute of Science, Indonesian Nature Conservation Service, and Leuser Development Programme for granting permission and giving support for research in the Leuser Ecosystem. R. McN. Alexander, T. M. Blackburn, S. Burtles. J. Rees, N. Jeffery, E. E. Vereecke, A. Walker, A. Wilson, and B. Wood commented on the manuscript. R. Savage developed the animation (fig. S1). Studies of captive animals were hosted by the North of England Zoological Society. This research was supported by grants from the Leverhulme Trust, the Royal Society, the L.S.B. Leakey Foundation, and the Natural Environment Research Council. Supporting Online Material www.sciencemag.org/cgi/content/full/316/5829/1328/DC1 Table S1 Movies S1 to S3 5 February 2007; accepted 18 April 2007 10.1126/science.1140799 Genome-Wide Association Analysis Identifies Loci for Type 2 Diabetes and Triglyceride Levels Diabetes Genetics Initiative of Broad Institute of Harvard and MIT, Lund University, and Novartis Institutes for BioMedical Research*† New strategies for prevention and treatment of type 2 diabetes (T2D) require improved insight into disease etiology. We analyzed 386,731 common single-nucleotide polymorphisms (SNPs) in 1464 patients with T2D and 1467 matched controls, each characterized for measures of glucose metabolism, lipids, obesity, and blood pressure. With collaborators (FUSION and WTCCC/UKT2D), we identified and confirmed three loci associated with T2D—in a noncoding region near CDKN2A and CDKN2B, in an intron of IGF2BP2, and an intron of CDKAL1—and replicated associations near HHEX and in SLC30A8 found by a recent whole-genome association study. We identified and confirmed association of a SNP in an intron of glucokinase regulatory protein (GCKR) with serum triglycerides. The discovery of associated variants in unsuspected genes and outside coding regions illustrates the ability of genome-wide association studies to provide potentially important clues to the pathogenesis of common diseases. T ype 2 diabetes, obesity, and cardiovascular risk factors are caused by a combination of genetic susceptibility, environment, be- havior, and chance. Whole-genome association studies (WGAS) offer a new approach to gene discovery unbiased with regard to presumed functions or locations of causal variants. This approach is based on Fisher’s theory for additive effects at common alleles (1); human heterozy- to purifying selection, and has been made pos- sible by genomic advances such as the human genome sequence, SNP and HapMap databases, and genotyping arrays (3). We studied 1464 patients with T2D and 1467 controls from Finland and Sweden, each characterized for 18 clinical traits: anthropomet- ric measures, glucose tolerance and insulin se- cretion, lipids and apolipoproteins, and blood applying stringent quality-control filters, high- quality genotypes for 386,731 common SNPs were obtained (4). To extend the set of putative causal alleles tested for association, we devel- oped 284,968 additional multimarker (haplo- type) tests based on these SNP genotypes (5, 6). The 671,699 allelic tests capture (correlation co- efficient r2 ≥ 0.8) 78% of common SNPs in HapMap CEU (3). Each SNP and haplotype test was assessed for association to T2D and each of 18 traits with the software package PLINK (http://pngu.mgh. harvard.edu/purcell/plink/). For T2D, a weighted meta-analysis was used to combine results for the population-based and family-based subsam- ples (4). For quantitative traits, multivariable linear or logistic regression with or without co- variates was performed (4). Association results for each SNP, haplotype test, and phenotype are available (www.broad.mit.edu/diabetes/). In genome-wide analysis involving hundreds of thousands of statistical tests, modest levels of bias imposed on the null distribution can over- whelm a small number of true results. We used three strategies to search for evidence of sys- tematic bias from unrecognized population struc- ture, the analytical approach, and genotyping artifacts (7, 8). First, we examined the distribu- tion of P-values in the population-based sam- ple, observing a close match to that expected for a null distribution (genomic inflation factor lGC = 1.05 for T2D). Second, we calculated G. Brice,6 B. Bullman,7 J. Campbell,8 B. Castle,9 R. Cetnarsyj,8 C. Chapman,10 C. Chu,11 N. Coates,12 T. Cole,10 R. Davidson,4 A. Donaldson,13 H. Dorkins,3 F. Douglas,2 D. Eccles,9 R. Eeles,1 F. Elmslie,6 D. G. Evans,7 S. Goff,6 S. Goodman,5 D. Goudie,2 J. Gray,15 L. Greenhalgh,16 H. Gregory,17 S. V. Hodgson,6 T. Homfray,6 R. S. Houlston,1 L. Izatt,18 L. Jackson,18 L. Jeffers,19 V. Johnson-Roffey,12 F. Kavalier,18 C. Kirk,19 F. Lalloo,7 C. Langman,18 I. Locke,1 M. Longmuir,4 J. Mackay,20 A. Magee,19 S. Mansour,6 Z. Miedzybrodzka,17 J. Miller,11 P. Morrison,19 V. Murday,4 J. Paterson,21 G. Pichert,18 M. Porteous,8 N. Rahman,6 M. Rogers,15 S. Rowe,22 S. Shanley,1 A. Saggar,6 G. Scott,2 L. Side,23 L. Snadden,4 M. Steel,2 M. Thomas,5 S. Thomas,1 1 Clinical Genetics Service, Royal Marsden Hospital, Downs Road, Sutton, Surrey, SM2 5PT, UK. 2 Department of Clinical Genetics, Ninewells Hospital, Dundee, DD1 9SY, UK. 3 Medical and Community Genetics, Kennedy-Galton Centre, Level 8V, Northwick Park and St. Mark’s NHS Trust, Watford Rd, Harrow, HA1 3UJ, UK. 4 Institute of Medical Genetics, Yorkhill NHS Trust, Dalnair Street, Glasgow, G3 8SJ, UK. 5 Clinical Genetics Department, Royal Devon and Exeter Hospital (Heavitree), Gladstone Road, Exeter, EX1 2ED, UK. 6 Department of Clinical Genetics, St. George’s Hospital Medical School, Jenner Wing, Cranmer Terrace, London, SW17 0RE, UK. 7 Department of Medical Genetics, St. Mary’s Hospital, Hathersage Road, Manchester, M13 0JH, UK. 8 South East of Scotland Clinical Genetics Service, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, UK. 9 Department of Medical Genetics, The Princess Anne Hospital, Coxford Road, Southampton, S016 5YA, UK. 10 Clinical Genetics Unit, Birmingham Women’s Hospital, Metchley Park Road, Edgbaston, Birmingham, B15 2TG, UK. 11 Yorkshire Regional Genetic Service, Department of Clinical Genetics, Cancer Genetics Building, St. James University Hospital, Beckett Street, Leeds, LS9 7TF, UK. 12 Department of Clinical Genetics, Leicester Royal Infirm- ary, Leicester, LE1 5WW, UK. 13 Department of Clinical Genetics, St Michael’s Hospital, Southwell Street, Bristol, BS2 8EG, UK. 14 Institute of Human Genetics, International Centre for Life, Central Parkway, Newcastle upon Tyne, NE1 3BZ, UK. 15 Institute of Medical Genetics, University Hospital of Wales, Heath Park, Cardiff, CF14 4XW, UK. 16 Department of Clinical Genetics, Alder Hey Children’s Hospital, Eaton Road, Liverpool L12 2AP, UK. 17 Clinical Genetics Centre, Argyll House, Foresterhill, Aberdeen, AB25 2ZR, UK. 18 Clinical Genetics, 7th Floor New Guy’s House, Guy’s UK. 19 Clinical Belvoir Park H 20 Clinical and Health, 30 G 21 Department Trust, Box 13 22 Department of Chester Ho 23 Department Road, Headin Supporting www.sciencema Materials and Figs. S1 to S8 Tables S1 to S References 9 March 2007 Published onli 10.1126/scien Include this in A Genome-Wide Association Study of Type 2 Diabetes in Finns Detects Multiple Susceptibility Variants Laura J. Scott,1 Karen L. Mohlke,2 Lori L. Bonnycastle,3 Cristen J. Willer,1 Yun Li,1 William L. Duren,1 Michael R. Erdos,3 Heather M. Stringham,1 Peter S. Chines,3 Anne U. Jackson,1 Ludmila Prokunina-Olsson,3 Chia-Jen Ding,1 Amy J. Swift,3 Narisu Narisu,3 Tianle Hu,1 Randall Pruim,4 Rui Xiao,1 Xiao-Yi Li,1 Karen N. Conneely,1 Nancy L. Riebow,3 Andrew G. Sprau,3 Maurine Tong,3 Peggy P. White,1 Kurt N. Hetrick,5 Michael W. Barnhart,5 Craig W. Bark,5 Janet L. Goldstein,5 Lee Watkins,5 Fang Xiang,1 Jouko Saramies,6 Thomas A. Buchanan,7 Richard M. Watanabe,8,9 Timo T. Valle,10 Leena Kinnunen,10,11 Gonçalo R. Abecasis,1 Elizabeth W. Pugh,5 Kimberly F. Doheny,5 Richard N. Bergman,9 Jaakko Tuomilehto,10,11,12 Francis S. Collins,3 * Michael Boehnke1 * Identifying the genetic variants that increase the risk of type 2 diabetes (T2D) in humans has been a formidable challenge. Adopting a genome-wide association strategy, we genotyped 1161 Finnish T2D cases and 1174 Finnish normal glucose-tolerant (NGT) controls with >315,000 single-nucleotide polymorphisms (SNPs) and imputed genotypes for an additional >2 million autosomal SNPs. We carried out association analysis with these SNPs to identify genetic variants that predispose to T2D, compared our T2D association results with the results of two similar studies, and genotyped 80 SNPs in an additional 1215 Finnish T2D cases and 1258 Finnish NGT controls. We identify T2D-associated variants in an intergenic region of chromosome 11p12, contribute to the identification of T2D-associated variants near the genes IGF2BP2 and CDKAL1 and the ria (8). We ciation with the log-odd (8). We ob versus 31.6 P values < against the with a large consistent w SNPs that also sugges trols by birt successful; genomic co Analysi allowed us variation in portion, w (8, 13) that equilibrium Centre d’E (Utah resid 1 Department Genetics, Uni USA. 2 Depar Science, 6/2007 Study design: Richa Saxena1–6 and Valeriya Lyssenko7 (Team Leaders), Peter Almgren,7 Paul I. W. de Bakker,1–6 Noël P. Burtt,1 Jose C. Florez,1–6 Hong Chen,8 Joanne Meyer,8 Joel N. Hirschhorn,1,6,9–11 Mark J. Daly,1–3,5 Thomas E. Hughes,8 Leif Groop,7,12 David Altshuler1–6 (Chair) Clinical characterization and phenotypes: Valeriya Lyssenko7 and Richa Saxena1–6 (Team Leaders), Peter Almgren,7 Kristin Ardlie,1 Kristina Bengtsson Boström,13 Noël P. Burtt,1 Hong Chen,8 Jose C. Florez,1–6 Bo Isomaa,14,15 Sekar Kathiresan,1,3,5 Guillaume Lettre,1,6,9–11 Ulf Lindblad,16 Helen N. Lyon,1,6,9–11 Olle Melander,7 Christopher Newton-Cheh,1–3,5 Peter Nilsson,17 Marju Orho- Melander,7 Lennart Råstam,16 Elizabeth K. Speliotes,1,3,6,9–11 Marja-Riitta Taskinen,12 Tiinamaija Tuomi,12,15 Benjamin F. Voight,1–3,5 David Altshuler,1–6 Joel N. Hirschhorn,1,6,9–11 Thomas E. Hughes,8 Leif Groop7,12 (Chair) DNA sample QC and diabetes replication genotyping: Candace Guiducci1 and Valeriya Lyssenko7 (Team Leaders), Anna Berglund,7 Joyce Carlson,18 Lauren Gianniny,1 Rachel Hackett,1 Liselotte Hall,18 Johan Holmkvist,7 Esa Laurila,7 Marju Orho-Melander,7 Marketa Sjögren,7 Maria Sterner,18 Aarti Surti1 Margareta Svensson,7 Malin Svensson,7 Ryan Tewhey,1 Noël P. Burtt1 (Chair) Whole genome scan genotyping: Brendan Blumenstiel1 (Team Leader), Melissa Parkin,1 Matthew DeFelice,1 Candace Guiducci,1 Ryan Tewhey,1 Rachel Barry,1 Wendy Brodeur,1 Noël P. Burtt,1 Jody Camarata,1 Nancy Chia,1 Mary Fava,1 John Gibbons,1 Bob Handsaker,1 Claire Healy,1 Kieu Nguyen,1 Casey Gates,1 Carrie Sougnez,1 Diane Gage,1 Marcia Nizzari,1 David Altshuler,1–6 Stacey B. Gabriel1 (Chair) GCKR replication genotyping and analysis (Malmö Diet and Cancer Study): Sekar Kathiresan1,3,5 (Team Leader), Candace Guiducci,1 Aarti Surti,1 Noël P. Burtt,1 Olle Melander,7 Marju Orho-Melander7 (Chair) Statistical analysis: Benjamin F. Voight1–3,5 and Paul I. W. de Bakker1–6 (Team Leaders), Richa Saxena,1–6 Valeriya Lyssenko,7 Peter Almgren,7 Noël P. Burtt,1 Hong Chen,8 Gung-Wei Chirn,8 Qicheng Ma,8 Hemang Parikh,7 Delwood Richardson,8 Darrell Ricke,8 Jeffrey J. Roix,8 Leif Groop,7,12 Shaun Purcell,1,2 David Altshuler,1–6 Mark J. Daly1–3,5 (Chair) 1 Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), Cambridge, MA 02142, USA. 2 Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA 02114, USA. 3 Department of Medicine, Mas- sachusetts General Hospital, Boston, MA 02114, USA. 4 Department of Molecular Biology, Massachusetts General Hospital, Boston, MA 02114, USA. 5 Department of Medicine, Harvard Medical School, Boston, MA 02115, USA. 6 Depart- ment of Genetics, Harvard Medical School, Boston, MA 02115, USA. 7 Department of Clinical Sciences, Diabetes and Endocrinology Research Unit, University Hospital Malmö, Lund University, Malmö, Sweden. 8 Diabetes and Metabolism Disease Area, Novartis Institutes for BioMedical Research, 100 Technology Square, Cambridge, MA 02139, USA. 9 Depart- ment of Pediatrics, Harvard Medical School, Boston, MA 02115, USA. 10 Division of Endocrinology, Children’s Hospital, Boston, MA 02115, USA. 11 Division of Genetics, Children’s Hospital, Boston, MA 02115, USA. 12 Department of Medicine, Helsinki University Hospital, University of Helsinki, Helsinki, Finland. 13 Skaraborg Institute, Skövde, Sweden. 14 Malmska Municipal Health Center and Hospital, Jakobstad, Finland. 15 Folkhälsan Research Center, Helsinki, Finland. 16 Depart- ment of Clinical Sciences, Community Medicine Research Unit, University Hospital Malmö, Lund University, Malmö, Sweden. 17 Department of Clinical Sciences, Medicine Research Unit, University Hospital Malmö, Lund University, Malmö, Sweden. 18 Clinical Chemistry, University Hospital Malmö, Lund University, Malmö, Sweden. 19 Department of Psychiatry, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02115, USA. Supporting Online Material www.sciencemag.org/cgi/content/full/1142358/DC1 Materials and Methods Figs. S1 and S2 Tables S1 to S6 References 9 March 2007; accepted 20 April 2007 Published online 26 April 2007; 10.1126/science.1142358 Include this information when citing this paper. Replication of Genome-Wide Association Signals in UK Samples Reveals Risk Loci for Type 2 Diabetes Eleftheria Zeggini,1,2 * Michael N. Weedon,3,4 * Cecilia M. Lindgren,1,2 * Timothy M. Frayling,3,4 * Katherine S. Elliott,2 Hana Lango,3,4 Nicholas J. Timpson,2,5 John R. B. Perry,3,4 Nigel W. Rayner,1,2 Rachel M. Freathy,3,4 Jeffrey C. Barrett,2 Beverley Shields,4 Andrew P. Morris,2 Sian Ellard,4,6 Christopher J. Groves,1 Lorna W. Harries,4 Jonathan L. Marchini,7 Katharine R. Owen,1 Beatrice Knight,4 Lon R. Cardon,2 Mark Walker,8 Graham A. Hitman,9 Andrew D. Morris,10 Alex S. F. Doney,10 The Wellcome Trust Case Control Consortium (WTCCC),† Mark I. McCarthy,1,2 ‡§ Andrew T. Hattersley3,4 ‡ The molecular mechanisms involved in the development of type 2 diabetes are poorly understood. Starting from genome-wide genotype data for 1924 diabetic cases and 2938 population controls generated by the Wellcome Trust Case Control Consortium, we set out to detect replicated diabetes association signals through analysis of 3757 additional cases and 5346 controls and by integration of our findings with equivalent data from other international consortia. We detected diabetes susceptibility loci in and around the genes CDKAL1, CDKN2A/CDKN2B, and IGF2BP2 and confirmed the recently described associations at HHEX/IDE and SLC30A8. Our findings provide insight into the genetic architecture of type 2 diabetes, emphasizing the contribution of Here, we describe how integration of data from the WTCCC scan and our own replication studies with similar information generated by the Diabetes Genetics Initiative (DGI) (6) and the Finland–United States Investigation of NIDDM Genetics (FUSION) (7) has identified several additional susceptibility variants for T2D. In the WTCCC study, analysis of 490,032 autosomal SNPs in 16,179 samples yielded 459,448 SNPs that passed initial quality control (5). We considered only the 393,453 autosomal SNPs with minor allele frequency (MAF) ex- ceeding 1% in both cases and controls and no extreme departure from Hardy-Weinberg equi- librium (P < 10−4 in cases or controls) (8). This T2D-specific data set shows no evidence of sub- stantial confounding from population substruc- ture and genotyping biases (8). To distinguish true associations from those reflecting fluctuations under the null or residual errors arising from aberrant allele calling, we first submitted putative signals from the WTCCC study to additional quality control, including cluster- plot visualization and validation genotyping on REPORTS onFebruary8,2010www.sciencemag.orgDownloadedfrom
  • 43. ARTICLES A genome-wide association study identifies novel risk loci for type 2 diabetes Robert Sladek1,2,4 , Ghislain Rocheleau1 *, Johan Rung4 *, Christian Dina5 *, Lishuang Shen1 , David Serre1 , Philippe Boutin5 , Daniel Vincent4 , Alexandre Belisle4 , Samy Hadjadj6 , Beverley Balkau7 , Barbara Heude7 , Guillaume Charpentier8 , Thomas J. Hudson4,9 , Alexandre Montpetit4 , Alexey V. Pshezhetsky10 , Marc Prentki10,11 , Barry I. Posner2,12 , David J. Balding13 , David Meyre5 , Constantin Polychronakos1,3 & Philippe Froguel5,14 Type 2 diabetes mellitus results from the interaction of environmental factors with a combination of genetic variants, most of which were hitherto unknown. A systematic search for these variants was recently made possible by the development of high-density arrays that permit the genotyping of hundreds of thousands of polymorphisms. We tested 392,935 single-nucleotide polymorphisms in a French case–control cohort. Markers with the most significant difference in genotype frequencies between cases of type 2 diabetes and controls were fast-tracked for testing in a second cohort. This identified four loci containing variants that confer type 2 diabetes risk, in addition to confirming the known association with the TCF7L2 gene. These loci include a non-synonymous polymorphism in the zinc transporter SLC30A8, which is expressed exclusively in insulin-producing b-cells, and two linkage disequilibrium blocks that contain genes potentially involved in b-cell development or function (IDE–KIF11–HHEX and EXT2–ALX4). These associations explain a substantial portion of disease risk and constitute proof of principle for the genome-wide approach to the elucidation of complex genetic traits. The rapidly increasing prevalence of type 2 diabetes mellitus (T2DM) is thought to be due to environmental factors, such as increased availabil- ity of food and decreased opportunity and motivation for physical activity, acting on genetically susceptible individuals. The heritability of T2DM is one of the best established among common diseases and, consequently, genetic risk factors for T2DM have been the subject of intense research1 . Although the genetic causes of many monogenic forms of diabetes (maturity onset diabetes in the young, neonatal mito- chondrial and other syndromic types of diabetes mellitus) have been elucidated, few variants leading to common T2DM have been clearly identified and individually confer only a small risk (odds ratio < 1.1– 1.25) of developing T2DM1 . Linkage studies have reported many T2DM-linked chromosomal regions and have identified putative, cau- sative genetic variants in CAPN10 (ref. 2), ENPP1 (ref. 3), HNF4A (refs genotypes for 392,935 single-nucleotide polymorphisms (SNPs) in 1,363 T2DM cases and controls (Supplementary Table 1). In order to enrich for risk alleles21 , the diabetic subjects studied in stage 1 were selected to have at least one affected first degree relative and age at onset under 45 yr (excluding patients with maturity onset diabetes in the young). Furthermore, in order to decrease phenotypic hetero- geneity and to enrich for variants determining insulin resistance and b-cell dysfunction through mechanisms other than severe obesity, we initially studied diabetic patients with a body mass index (BMI) ,30 kg m22 . Control subjects were selected to have fasting blood glucose ,5.7 mmol l21 in DESIR, a large prospective cohort for the study of insulin resistance in French subjects22 . Genotypes for each study subject were obtained using two plat- Sladek, 2007How many SNPs (p-value?) European-based; N ~ 1000 cases: high fasting blood glucose/non-obese controls: non-obese
  • 44. Human Hap300 chip, showing no T2DM association in stage 1 (P . 0.01) and separated by at least 100 kb. Using the first principal component as a covariate for ancestry differences between cases and controls, we tested for association between rs932206 and disease status. Our result suggests that this apparent association is largely BMI on the association between marker and disease, as it is asymp- totically equivalent to the Armitage trend test used to detect asso- ciation in stages 1 and 2. None of the associations (Supplementary Table 7) was substantially changed by considering the effects of these covariates. 5 3 1 5 3 1 5 3 1 5 3 1 5 3 1 15 10 5 5 3 1 5 3 1 5 3 1 5 3 1 5 3 1 5 3 1 5 3 1 5 3 1 5 3 1 5 3 1 5 3 1 5 3 1 5 3 1 5 3 1 5 3 1 5 3 1 5 3 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 19 20 21 22 X 18 Figure 1 | Graphical summary of stage 1 association results. T2DM association was determined for SNPs on the Human1 and Hap300 chips. The x axis represents the chromosome position from pter; the y axis shows 2log10[pMAX], the P-value obtained by the MAX statistic, for each SNP (Note the different scale on the y axis of the chromosome 10 plot.). SNPs that passed the cutoff for a fast-tracked second stage are highlighted in red. 882 Nature©2007 Publishing Group Sladek, 2007
  • 45. Identification of four novel T2DM loci Our fast-track stage 2 genotyping confirmed the reported association for rs7903146 (TCF7L2) on chromosome 10, and in addition iden- tified significant associations for seven SNPs representing four new T2DM loci (Table 1). In all cases, the strongest association for the MAX statistic (see Methods) was obtained with the additive model. The most significant of these corresponds to rs13266634, a non- synonymous SNP (R325W) in SLC30A8, located in a 33-kb linkage disequilibrium block on chromosome 8, containing only the 39 end of this gene (Fig. 2a). SLC30A8 encodes a zinc transporter expressed solely in the secretory vesicles of b-cells and is thus implicated in the final stages of insulin biosynthesis, which involve co-crystallization Table 1 | Confirmed association results SNP Chromosome Position (nucleotides) Risk allele Major allele MAF (case) MAF (ctrl) Odds ratio (het) Odds ratio (hom) PAR ls Stage 2 pMAX Stage 2 pMAX (perm) Stage 1 pMAX Stage 1 pMAX (perm) Nearest gene rs7903146 10 114,748,339 T C 0.406 0.293 1.65 6 0.19 2.77 6 0.50 0.28 1.0546 1.5 3 10234 ,1.0 3 1027 3.2 3 10217 ,3.3 3 10210 TCF7L2 rs13266634 8 118,253,964 C C 0.254 0.301 1.18 6 0.25 1.53 6 0.31 0.24 1.0089 6.1 3 1028 5.0 3 1027 2.1 3 1025 1.8 3 1025 SLC30A8 rs1111875 10 94,452,862 G G 0.358 0.402 1.19 6 0.19 1.44 6 0.24 0.19 1.0069 3.0 3 1026 7.4 3 1026 9.1 3 1026 7.3 3 1026 HHEX rs7923837 10 94,471,897 G G 0.335 0.377 1.22 6 0.21 1.45 6 0.25 0.20 1.0065 7.5 3 1026 2.2 3 1025 3.4 3 1026 2.5 3 1026 HHEX rs7480010 11 42,203,294 G A 0.336 0.301 1.14 6 0.13 1.40 6 0.25 0.08 1.0041 1.1 3 1024 2.9 3 1024 1.5 3 1025 1.2 3 1025 LOC387761 rs3740878 11 44,214,378 A A 0.240 0.272 1.26 6 0.29 1.46 6 0.33 0.24 1.0046 1.2 3 1024 2.8 3 1024 1.8 3 1025 1.3 3 1025 EXT2 rs11037909 11 44,212,190 T T 0.240 0.271 1.27 6 0.30 1.47 6 0.33 0.25 1.0045 1.8 3 1024 4.5 3 1024 1.8 3 1025 1.3 3 1025 EXT2 rs1113132 11 44,209,979 C C 0.237 0.267 1.15 6 0.27 1.36 6 0.31 0.19 1.0044 3.3 3 1024 8.1 3 1024 3.7 3 1025 2.9 3 1025 EXT2 Significant T2DM associations were confirmed for eight SNPs in five loci. Allele frequencies, odds ratios (with 95% confidence intervals) and PAR were calculated using only the stage 2 data. Allele frequencies in the controls were very close to those reported for the CEU set (European subjects genotyped in the HapMap project). Induced sibling recurrent risk ratios (ls) were estimated using stage 2 genotype counts for the control subjects and assuming a T2DM prevalence of 7% in the French population. hom, homozygous; het, heterozygous; major allele, the allele with the higher frequency in controls; pMAX, P-value of the MAX statistic from the x2 distribution; pMAX (perm), P-value of the MAX statistic from the permutation-derived empirical distribution (pMAX and pMAX (perm) are adjusted for variance inflation); risk allele, the allele with higher frequency in cases compared with controls. 0 2 4 –log10[P] –log10[P] SLC30A8 IDE HHEXKIF11 0 2 4 a b NATURE|Vol 445|22 February 2007 ARTICLES Sladek, 2007 5 3 1 5 3 1 15 10 5 1 1 1 5 3 1 5 3 1 5 3 1 5 3 1 5 3 1 5 3 1 5 3 1 3 4 5 8 9 10 13 14 15 19 20 X 18 DM 2log10[pMAX], the P-value obtained by the MAX statistic, for each SNP How would you interpret the p- values? Odds ratios? Confirmed 8 SNPs with N ~ 1000
  • 46. Scaling up discovery by combining populations: meta-analyses
  • 47. g the Diabetes Genetics nvestigation of NIDDM nd (iv) the Framingham omponent studies (n ¼ ry Table 1 online. aring, the four consortia n 10 and 20 SNPs promi- their individual, interim, mentary Table 2 online). oci with consistent effects dies. Two of these repre- 6PC2 and GCK. In addi- nerated evidence for an NPs around the MTNR1B rs1387153, P ¼ 2.2 Â 10À11; DFS: rs10830963, 5.8 Â 10À4, for the most ch analysis). The associa- d on formal meta-analysis r exclusion of individuals ¼ 1.1 Â 10À57; rs4607517 NR1B), P ¼ 3.2 Â 10À50; pplementary Table 3 and ent efforts to harmonize (including the additional data from the WTCCC, DGI and FUSION scans)10 (Supplementary Note). We found strong evidence that the minor G allele of rs10830963 was associated with increased risk of T2D (odds ratio ¼ 1.09 (1.05–1.12), P ¼ 3.3 Â 10À7; Fig. 2 and Supplementary Table 6 online). The possibility that the fasting glucose association might DGI Study ID OR (95% CI) Weight (%) 1.12 (0.96, 1.30) 4.61 4.89 8.03 9.58 3.53 8.75 2.69 6.04 10.56 23.18 2.85 7.41 7.90 100.00 1.20 (1.03, 1.39) 1.07 (0.95, 1.20) 1.14 (1.03, 1.27) 1.00 (0.84, 1.19) 1.17 (1.04, 1.30) 1.07 (0.88, 1.31) 1.16 (1.02, 1.33) 1.00 (0.90, 1.10) 1.03 (0.96, 1.10) 0.91 (0.75, 1.10) 1.15 (1.02, 1.30) 1.16 (1.03, 1.30) 1.09 (1.05, 1.12) Meta-analysis P value = 3.3 × 10 –7 FUSION WTCCC deCODE KORA Rotterdam CCC ADDITION/ELY Norfolk UKT2DGC OxGN/58BC FUSION Stage 2 METSIM .722 1 1.39 Overall (I 2 = 26.6%, P = 0.176) Figure 2 Association of rs10830963 with type 2 diabetes (T2D) in 13 case- control studies. VOLUME 41 [ NUMBER 1 [ JANUARY 2009 NATURE GENETICS Meta-analysis of SNP rs10830963: Combining findings from multiple cohorts Propenko, 2009
  • 48. A RT I C L E S By combining genome-wide association data from 8,130 individuals with type 2 diabetes (T2D) and 38,987 controls of European descent and following up previously unidentified meta-analysis signals in a further 34,412 cases and 59,925 controls, we identified 12 new T2D association signals with combined P < 5 × 10−8. These include a second independent signal at the KCNQ1 locus; the first report, to our knowledge, of an X-chromosomal association (near DUSP9); and a further instance of overlap between loci implicated in monogenic and multifactorial forms of diabetes (at HNF1A). The identified loci affect both beta-cell function and insulin action, and, overall, T2D association signals show evidence of enrichment for genes involved in cell cycle regulation. We also show that a high proportion of T2D susceptibility loci harbor independent association signals influencing apparently unrelated complex traits. Type 2 diabetes (T2D) is characterized by insulin resistance and deficient beta-cell function1. The escalating prevalence of T2D and the limitations of currently available preventative and therapeutic options highlight the need for a more complete understanding of T2D pathogenesis. To date, approximately 25 genome-wide significant common variant associations with T2D have been described, mostly through genome-wide association (GWA) analyses2–13. The identities of the variants and genes mediating the susceptibility effects at most of these signals have yet to be established, and the known variants account for less than 10% of the overall estimated genetic contribution to T2D predisposition. Although some of the unexplained heritability will reflect variants poorly captured by existing GWA platforms, we reasoned that an expanded meta-analysis of existing GWA data would the inverse-variance method (Online Methods, Fig. 1, Supplementary Tables 1 and 2 and Supplementary Note). We observed only modest genomic control inflation ( gc = 1.07), suggesting that the observed results were not due to population stratification. After removing SNPs within established T2D loci (Supplementary Table 3), the result- ing quantile-quantile plot was consistent with a modest excess of disease associations of relatively small effect (Supplementary Note). Weak evidence for association at HLA variants strongly associated with autoimmune forms of diabetes (Supplementary Table 3 and Supplementary Note) suggested some case admixture involving subjects with type 1 diabetes or latent autoimmune diabetes of adult- hood; however, failure to detect T2D associations at other non-HLA type 1 diabetes susceptibility loci (for example, INS, PTPN22 and Twelve type 2 diabetes susceptibility loci identified through large-scale association analysis Voight, 2010 Meta-analyses for T2D: N>40K and 90K identifies >30 loci among 2,400,000 SNPs
  • 49. A RT I C L E S 13 autosomal loci exceeded the threshold for genome-wide significance (P ranging from 2.8 × 10−8 to 1.4 × 10−22) with allele-specific odds (r2 < 0.05), and conditional analyses (see below) establish these SNPs as independent (Fig. 2 and Supplementary Table 4). Further analysis 50 Locus established previously Locus identified by current study Locus not confirmed by current study BCL11A THADA NOTCH2 ADAMTS9 IRS1 IGF2BP2 WFS1 ZBED3 CDKAL1 HHEX/IDE KCNQ1 (2 signals*: ) TCF7L2 KCNJ11 CENTD2 MTNR1B HMGA2 ZFAND6 PRC1 FTO HNF1B DUSP9 Conditional analysis Unconditional analysis TSPAN8/LGR5 HNF1A CDC123/CAMK1D CHCHD9 CDKN2A/2B SLC30A8 TP53INP1 JAZF1 KLF14 PPAR 40 30 –log10(P)–log10(P) 20 10 10 1 2 3 4 5 6 7 8 Chromosome 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X 0 0 Suggestive statistical association (P < 1 10 –5 ) Association in identified or established region (P < 1 10 –4 ) Figure 1 Genome-wide Manhattan plots for the DIAGRAM+ stage 1 meta-analysis. Top panel summarizes the results of the unconditional meta- analysis. Previously established loci are denoted in red and loci identified by the current study are denoted in green. The ten signals in blue are those taken forward but not confirmed in stage 2 analyses. The genes used to name signals have been chosen on the basis of proximity to the index SNP and should not be presumed to indicate causality. The lower panel summarizes the results of equivalent meta-analysis after conditioning on 30 previously established and newly identified autosomal T2D-associated SNPs (denoted by the dotted lines below these loci in the upper panel). Newly discovered conditional signals (outside established loci) are denoted with an orange dot if they show suggestive levels of significance (P < 10−5), whereas secondary signals close to already confirmed T2D loci are shown in purple (P < 10−4). Meta-analyses for T2D: N>40K and 90K identifies >30 loci among 2,400,000 SNPs
  • 50. 0 20 40 60 80 100 recombinationrate(cM/Mb) ●●● ●● ●● ●●● ● ● ● ●●● ● ●●●●● ● ● ● ●●● ●● ●● ● ● ●●● ●● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ●● ●● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ●●●●● ● ●● ● ● ● ●●● ● ● ● ● ● ● ●● ● ●●● ●●● ● ● ● ● ● ● ●●●●● ●●●● ● ● ●● ● ● ●● ● ● ●● ● ●● ● ● ●●● ●● ●● ● ●● ● ●● ● ● ● ● ●●●● ● ● ● ● ● ●● ● ●● ●● ●● ● ● ● ● ● ● ● ●● ● ●●●● ● ● ● ●● ● ●● ● ●●● ● ● ● ● ● ●●●● ● ● ●● ● ● ●●●●● ● ● 2 −> PGCP 98 SLC30A8 Region 0 2 4 6 8 10 −log10(P−value) 0 20 40 60 80 100 recombinationrate(cM/Mb) rs3802177 ●●●● ● ● ● ● ● ● ● ● ● ●● ● ●● ●●● ● ● ● ● ●●● ●● ● ●●●●●● ● ●●● ● ● ● ● ● ● ●● ●●● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ●● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ●●● ● ● ● ● ●●●●● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ●● ● ●●● ●● ●● ● ●● ● ●●● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ●●●● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●●●● ●● ● ● ●● ●●● ● ●●●●● ●● ●●● ● ●●● ● ● ● ● ●●● ●● ● ● ● ●●●●● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●●● ● ● ● ● ●●● ● ●●● ● ● ●● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●●●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ●●● ●● ● ● ● ● ●●● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●●● ●●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ●● ● ● ●● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ●● ●●● ●● ●●●●●● ● ● ● ●● ●● ● ● ● ● ●●● ● ● ● ●● ● ●● ● ●● ●●●●● ● ● ● ●●● ● ● ●● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ●●● ● ●● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ●● ● ● ●●●● ●● ●● ●●● ● ● ● ●●●●● ● ●● ● ● ● ● ●● ● ● ●● ●●●●●●●●● ●●● ● ●●● ● ●● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ●● ● ●●●●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●●● ● ●●●●● ● ● ●●● ● ●●●● ● ●● ●● ● ● ●●● ● ● ●●●●●●● ● ● ● ● ● ● ●● ● ●● ● ●● ●●●●● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ●● ●● ●● ● ●●●● ●●● ● ●● ● ● ● ● ●●● ● ●●● ● ●● ● ●●● ● ●●●●●●●●●● ● ● ● ● ●●●● ● ●● ●●●●●●●●●●●●● ● ●●● ● ●● ●● ● ● ●● ●● ● ●●●●● ● ● ● ●● ●● ● ● ●●●●●● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●●● ● ● ●●● ● ● ●●● ● ● ● ● ●●●●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●●●● ● ●● ● ●●● ● ● ● ● ●●● ● ● ● ● ● ●● ●● ●● ● ● ● ● ●●●●● ● ● ●● ●● ● ●●●●● ● ● ●●● ●● ●●● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ●●● ● ●● ●● ●●●● ● ● ● ●●● ● ● ●●● ● ● ● ● ●● ● ● ●●●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●●● ●●●● ● ● ● ●● ● ●●●● ●● ● ● ● ●●●● ●● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ●●● ● ●● ●●● ● ● ●●● ● ● ●●●●● ● ● ● ● ●●●●● ● ●●●●● ● ●●● ● ● ●● ● ● ● ● ●●● ●● ●●● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●●● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●●●● ● ●●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●●● ● ●●● ● ●●●●●●● ● ● ● ● ● ● ●●●●●●●● ●● ● ● ● ● ●●●●●● ● ● ●● ● ●● ●●● ● ● ● ● ● ● ● ●●●● ●● ● ● ●●● ●●● ● ●●●● ● ●●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●●●●●●●●●●●● ●●●●●●● ● ● ●●●●●●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ●●● ● ●● ● ●●●● ●● ● ● ● ● ● ● ●●●● ● ●● ● ● ● ● ● ●● ●●●●● ● ● ● ●● ● ●●●●●●●●●●●●● ●●●●●●●●●● ● ●● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ●●●●●● ● ●● ● ●●●●●●● ● ●● ●●●● ● ●●●● ● ● ● ●●●●●● ● ●● ●●●●●●●●●●● ●●● ● ● ● ●●●●●● ● ●● ● ●●●●●● ●●●●● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●● ●● ● ● ● ●● ● ● ● ● ●●●● ●● ● ●●● ●● ●●● ● ●● ●● ● ●● ● ● ●●●●● ● ● ● ●● ●● ●●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●●● ●●● ● ●●● ● ●●●●●●●● ● ●●●● ● ● ●●● ● ●● ● ●●● ● ●●●● ● ●● ●●● ● ●●●●● ●●●● ●● ●●● ● ● ● ● ● ● ●●●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●●●●●●●●●●● ● ● ●●●●● ● ● ●●●●● ● ●●●● ● ●● ● ●●●●● ● ●●●● ●● ● ●● ● ● ● ●● ●●●●●●●●●●●●● ● ● ●●●●●●● ●●●● ● ●● ●● ●●● ● ● ●● ●●● ● ●●●● ● ● ●●● ●●●●●●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ●●●●● ● ●●●●●●●●●●● ● ●●●●●●● ●●●●●●●● ● ● ● ● ●● ● ● ● ●●● ●● ● ● ●●●●●●●●●●●●●●●● ●●●●● ●●●●● ● ● ● ● ● ● ●● ●● ● ● ● ●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ●●●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ●● ● ●● ● ●● ● ● ● ● ●● ● ● ● ●● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●●●●●●●●● ● ●●●● ●● ●●● ●● ●● ●●● ● ●● ● ● ● ●●●● ● ● ● ● ● ● ● ●● ● ● ●●● ● ●● ● ●● ● ● ● ●●● ● ● ●●●●●●●● ● ●●●● ●● ● ●● ●● ● ●●●●●●● ●●●● ● ● ●● ●●● ● ●●● ●●● ● ●● ● ● ● ●● ● ●●●● ● ● ● ● ●●● ● ●●●●●●●● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ●● ●● ● ● ● ●●●● ● ● ● ● ●● ● ●● ●● ● ● ●● ● ●●●● ●● ●● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ●●● ● ●● ●●●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●●●● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ●● ●●●●●●●● ● ● ●●●●●●● ● ●●● ● ● ●●●●● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ●● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ●● ● ● ●● ● ● ● ●● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ●● ● ●●●●●●●●●● ●●●●● ●● ●●● ●●● ● ● ●●●● ●●●●●●●●●● ● ● ● ● ●● ●●●●● ●●●●●●●●●● ●●●●● ● ● ● ● ● ● ●●●●●●●● ● ● ● ●●●● ●●●● ●●● ● ● ●● ● ● ●● ● ● ● ●●●●● ●● ● ● ● ● ● ● ● ●●●● ● ●●● ● ● ●● ● ● ●● ● ● ● ●● ● ●● ●●● ● ● ● ● ●●● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●●●● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ●● ● ● ● ●● ● ●● ●● ● ● ● rs3802177 stage 1 ● r^2: 0.8 − 1.0 ● r^2: 0.6 − 0.8 ● r^2: 0.4 − 0.6 ● r^2: 0.2 − 0.4 ● r^2: 0.0 − 0.2 ● r^2 missing <− TRPS1 <− EIF3H UTP23 −> <− RAD21 LOC441376 −> SLC30A8 −> MED30 −> <− EXT1 <− SAMD12 <− TNFRSF11 COLEC1 117 118 119 120 Position on chromosome 8 (Mb) CDKN2A/B Region 0 2 4 6 8 10 −log10(P−value) 0 20 40 60 80 100 recombinationrate(cM/Mb) rs10965250 ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ●●● ● ●●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●●● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ●● ● ●●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●●● ● ●● ● ●● ● ● ● ●●● ● ●●● ● ● ● ● ●●● ● ●●● ● ● ● ● ●●●● ●● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●●● ● ●● ● ● ● ● ● ● ●● ● ● ●●●●● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ●● ●●● ● ●● ●● ● ● ●● ●●● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ●●●●●●●●● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ●●●●●●● ●●● ● ● ● ●● ● ● ●●●● ● ● ● ●● ● ● ● ● ●●●●● ● ●● ●●●●●● ● ● ● ●● ● ● ●●● ● ● ● ●●● ● ●●●● ● ● ● ●●●● ●● ●●● ●● ●●●●● ●● ●●● ●●●●● ● ●●●● ● ● ● ●● ● ● ● ● ●●● ● ● ●● ●● ●● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ●●●●●●● ●●● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ●● ●●●●●●●●●● ● ●●● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ●●●●● ● ●● ●● ● ● ●●● ●● ● ●● ● ● ● ● ● ●●● ● ●●● ● ●●● ● ● ● ● ●●●●●●●●●●●●● ● ●● ●●● ●●● ●●● ● ● ● ●●●● ●● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ●●● ●● ●● ●●●●●●●●●●●●●●● ● ●●● ●●●●● ● ● ● ● ● ● ● ●●●● ●● ● ● ● ● ● ●● ● ●●● ● ● ●● ●●●●● ● ●● ● ● ● ● ●●●●●●● ● ● ● ● ● ●●● ●● ● ●●● ● ●●● ● ●●●●●●●●●●●●●●●● ●●●● ●● ● ●● ●● ●● ● ● ● ● ● ●● ● ●● ● ●●● ● ●●● ● ●●●●● ● ●● ● ●●● ●● ●● ● ● ●●● ●● ●●●● ●● ●● ●● ●● ● ● ● ● ● ● ●●●● ● ●●●●● ● ● ● ●●●● ● ●● ● ● ● ● ●●● ● ●● ● ● ●●●●● ● ● ● ● ● ●● ● ●● ● ●●●●● ● ●● ●●●●● ●● ● ●●● ● ● ●●● ● ● ● ● ● ● ● ●● ● ●●●● ● ● ●● ● ●● ●●●●●●●●●●●●●● ●● ● ●● ●●● ● ● ● ●● ●● ● ●●● ● ●●●● ● ● ● ● ●● ●● ●● ●●●●●● ● ● ●● ● ● ●● ● ● ● ● ●● ● ●● ●●● ● ● ●● ● ●● ● ● ● ●● ● ●●● ● ●● ● ● ●●● ● ●●●●● ● ● ●●● ●●●●● ●● ●●● ● ● ● ● ● ● ●● ● ● ● ● ●●●●●● ● ●●● ●● ● ●●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ●● ● ● ●●●●● ●● ● ●● ● ●● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ●●●● ● ● ● ● ●● ● ● ●●●● ●● ●●● ●● ●● ● ● ● ●● ● ● ●●●● ●●● ● ● ●● ●● ● ● ● ●●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ●●● ● ● ● ● ●● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●●● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ●●● ●●●● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ●●● ●● ● ● ● ●● ●● ● ●● ●● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●●●●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●●● ●● ● ● ● ●● ● ● ●●● ● ●●●● ● ●● ● ● ● ● ● ●● ● ● ● ● ●●● ●●●●●● ●●●● ●● ●● ●●●● ●●● ●●● ● ● ● ● ●● ●● ● ●●● ●● ● ● ●●● ●●●● ● ●● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ●● ● ● ●● ●● ● ● ●● ● ● ●● ● ●● ● ●●●●● ●● ●● ● ● ● ●●● ●● ● ● ●● ● ●● ●● ●●● ● ● ● ●● ● ● ●● ● ●● ●●●●●●●●●●●●●●●● ● ●● ●●● ●● ●●●● ● ● ● ● ● ●● ● ● ● ●● ●●●●● ● ● ● ● ● ●● ●● ● ●● ● ●● ● ● ●●● ●● ● ● ● ● ●●● ● ●● ●● ● ●● ● ● ● ●● ● ● ●●●● ●●● ● ●● ●●●●● ● ● ●●● ● ●● ● ●● ● ● ●●● ●● ●●● ● ● ● ● ● ● ● ●● ● ● ●●●● ●●● ●● ●● ●● ● ●● ● ●● ● ● ●●●●● ● ●● ● ● ●● ● ● ● ●●●● ● ●● ● ●●● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ●● ● ●●●●● ● ● ●● ● ●● ● ● ●● ● ● ● ●●●●●● ● ● ●●●● ●● ● ●●●●● ● ● ● ● ●● ●● ● ●● ● ● ● ● ●●●●● ● ● ● ●●●● ● ● ● ●●●●●● ● ●● ●● ●●● ●●● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ●●●●● ●●● ● ●●● ● ● ● ● ● ● ●● ● ● ●●●●● ●●● ● ● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ●● ●●● ● ● ● ●● ● ●● ●● ● ● ●● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ●● ●●● ●● ●●● ●● ●● ●●● ●● ●● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ●● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ●●●●●● ● ● ● ● ● ● ● ● ● ● ●● ● ●●●● ● ● ● ● ● ● ●●● ●● ●●●●●● ●● ●●●●●●●● ● ● ● ● ● ● ● ●● ●● ● ●●●● ●● ●● ● ● ●● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ●● ●●●●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●●●●●●● ●● ● ● ● ● ●● ● ● ●● ● ● ●●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ●●●●● ●● ●● ● ● ● ●● ● ● ● ● ●●●●●●●● ●●● ● ●●●● ●●● ● ● ●● ● ● ●●●● ●●●● ● ● ● ● ● ● ●●●●● ● ● ● ● ● ● ● ● ● ● ●●● ●●● ● ● ● ●● ●● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●●●● ●● ● ● ● ● ● ●●● ● ● ●● ●● ● ● ●●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ●●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ●●●● ●●● ● ● ● ●●● ● ● ● ● ●●● ● ● ● ● ●● ●● ● ●● ● ● ●●● ● ● ● ●● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ●●● ● ● ● ● ●●● ● ●● ●● ●● ●● ●● ● ●●● ●● ●●● ● ●●● ● ● ● ●● ● ● ● ● ● ● ●●● ● ●●●● ● ● ● ●● ●●● ● ● ●●● ●● ●● ●●●●● ● ● ●●●● ● ● ●●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ●● ● ● ● ● ●● ● ● ●● ● ●● ● ●●●●● ●●● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ●●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●●●●●●● ●● ●●●● ●● ● ● ●● ●● ● ● ● ●● ● ●●● ● ● ● ●● ● ● ●●●● ●●●●● ●●●●● ●● ● ●●●● ● ● ●● ● ●●● ● ● ●●● ●● ● ● ●● ● ● ● ● ●● ●●● ●● ●● ●● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●●●●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ●● ●● ● ● ●● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●●● ● ●●● ● ● ●● ● ●●●● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ●●● ● ● ●●●● ●● ● ● ● ● ● ● ● ●●● ● ● ●●● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●●●● ● ●● ● ●● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ●●● ●●● ●● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ●● ●● ● ●● ● ● ●●● ● ● ● ● ● ●● rs10965250 stage 1 ● r^2: 0.8 − 1.0 ● r^2: 0.6 − 0.8 ● r^2: 0.4 − 0.6 ● r^2: 0.2 − 0.4 ● r^2: 0.0 − 0.2 ● r^2 missing <− MLLT3 KIAA1797 −> <− PTPLAD2 <− IFNB1 <− IFNW1 <− IFNA21 <− IFNA4 <− IFNA7 <− IFNA13 MTAP −> <− CDKN2A <− CDKN2B DMRTA1 −> <− ELAVL2 21 22 23 24 Position on chromosome 9 (Mb) 40 60 80 100 recombinationrate(c CDC123/CAMK1D Region 4 6 8 10 log10(P−value) 40 60 80 100 recombinationrate(c rs12779790 ●●● ● ● ●● ● rs12779790 stage 1 ● r^2: 0.8 − 1.0 ● r^2: 0.6 − 0.8 ● r^2: 0.4 − 0.6 ● r^2: 0.2 − 0.4 ● r^2: 0.0 − 0.2 ● r^2 missing HHEX/IDE Region 10 15 log10(P−value) 40 60 80 100 recombinationrate(c rs5015480 ● ● ● ● ● ●● ● ● ● ●●●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●●● rs5015480 stage 1 ● r^2: 0.8 − 1.0 ● r^2: 0.6 − 0.8 ● r^2: 0.4 − 0.6 ● r^2: 0.2 − 0.4 ● r^2: 0.0 − 0.2 ● r^2 missing .609 Not in a gene...In a gene... ~90% of GWAS hits are non-coding!
  • 51. pporting!Figures! ! ! ~90% of GWAS hits are non-coding! Stamatoyannopoulos, Science 2012 Systematic Localization of Common Disease-Associated Variation in Regulatory DNA Matthew T. Maurano,1 * Richard Humbert,1 * Eric Rynes,1 * Robert E. Thurman,1 Eric Haugen,1 Hao Wang,1 Alex P. Reynolds,1 Richard Sandstrom,1 Hongzhu Qu,1,2 Jennifer Brody,3 Anthony Shafer,1 Fidencio Neri,1 Kristen Lee,1 Tanya Kutyavin,1 Sandra Stehling-Sun,1 Audra K. Johnson,1 Theresa K. Canfield,1 Erika Giste,1 Morgan Diegel,1 Daniel Bates,1 R. Scott Hansen,4 Shane Neph,1 Peter J. Sabo,1 Shelly Heimfeld,5 Antony Raubitschek,6 Steven Ziegler,6 Chris Cotsapas,7,8 Nona Sotoodehnia,3,9 Ian Glass,10 Shamil R. Sunyaev,11 Rajinder Kaul,4 John A. Stamatoyannopoulos1,12 † Genome-wide association studies have identified many noncoding variants associated with common diseases and traits. We show that these variants are concentrated in regulatory DNA marked by deoxyribonuclease I (DNase I) hypersensitive sites (DHSs). Eighty-eight percent of such DHSs are active during fetal development and are enriched in variants associated with gestational exposure–related phenotypes. We identified distant gene targets for hundreds of variant-containing DHSs that may explain phenotype associations. Disease-associated variants systematically perturb transcription factor recognition sequences, frequently alter allelic chromatin states, and form regulatory networks. We also demonstrated tissue-selective enrichment of more weakly disease-associated variants within DHSs and the de novo identification of pathogenic cell types for Crohn’s disease, multiple sclerosis, and an electrocardiogram trait, without prior knowledge of physiological mechanisms. Our results suggest pervasive involvement of regulatory DNA variation in common human disease and provide pathogenic insights into diverse disorders. D isease- and trait-associated genetic variants are rapidly being identified with genome- wide association studies (GWAS) and re- lated strategies (1). To date, hundreds of GWAS have been conducted, spanning diverse diseases and quantitative phenotypes (2) (fig. S1A). How- ever, the majority (~93%) of disease- and trait- associated variants emerging from these studies lie within noncoding sequence (fig. S1B), com- plicating their functional evaluation. Several lines of evidence suggest the involvement of a propor- tion of such variants in transcriptional regulatory mechanisms, including modulation of promoter and enhancer elements (3–6) and enrichment with- in expression quantitative trait loci (eQTL) (3, 7, 8). Human regulatory DNA encompasses a vari- ety of cis-regulatory elements within which the co- operative binding of transcription factors creates focal alterations in chromatin structure. Deoxy- ribonuclease I (DNase I) hypersensitive sites (DHSs) are sensitive and precise markers of this actuated regulatory DNA, and DNase I mapping has been instrumental in the discovery and census of hu- man cis-regulatory elements (9). We performed DNase I mapping genome-wide (10) in 349 cell and tissue samples, including 85 cell types studied under the ENCODE Project (10) and 264 sam- ples studied under the Roadmap Epigenomics Program (11). These encompass several classes nome. In total, we identified 3,899,693 distinct DHS positions along the genome (collectively spanning 42.2%), each of which was detected in one or more cell or tissue types (median = 5). Disease- and trait-associated variants are concentrated in regulatory DNA. We examined the distribution of 5654 noncoding genome-wide significant associations [5134 unique single- nucleotide polymorphisms (SNPs); fig. S1 and table S2] for 207 diseases and 447 quantitative traits (2) with the deep genome-scale maps of regulatory DNA marked by DHSs. This revealed a collective 40% enrichment of GWAS SNPs in DHSs (fig. S1C, P < 10−55 , binomial, compared to the distribution of HapMap SNPs). Fully 76.6% of all noncoding GWAS SNPs either lie within a DHS (57.1%, 2931 SNPs) or are in complete linkage disequilibrium (LD) with SNPs in a near- by DHS (19.5%, 999 SNPs) (Fig. 1A) (12). To con- firm this enrichment, we sampled variants from the 1000 Genomes Project (13) with the same ge- nomic feature localization (intronic versus inter- genic), distance from the nearest transcriptional start site, and allele frequency in individuals of European ancestry. We confirmed significant en- richment both for SNPs within DHSs (P < 10−59 , simulation) and also including variants in com- plete LD (r 2 = 1) with SNPs in DHSs (P < 10−37 , simulation) (fig. S2). In total, 47.5% of GWAS SNPs fall within gene bodies (fig. S1B); however, only 10.9% of intronic GWAS SNPs within DHSs are in strong LD (r2 ≥ 0.8) with a coding SNP, indicating that the vast majority of noncoding genic variants are not simply tagging coding sequence. Analo- gously, only 16.3% of GWAS variants within coding sequences are in strong LD with variants in DHSs. SNPs on widely used genotyping arrays (e.g., Affymetrix) were modestly enriched with- in DHSs (fig. S2), possibly due to selection of SNPs with robust experimental performance in genotyping assays. However, we found no evi- dence for sequence composition bias (table S3). To further examine the enrichment of GWAS SNPs in regulatory DNA, we systematically clas- sified all noncoding GWAS SNPs by the quality 1 Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA. 2 Laboratory of Disease Genomics RESEARCH ARTICLE onSeptember12,2012www.sciencemag.orgDownloadedfrom
  • 52. There have been few, if any, similar bursts of discovery in the history of medical research. David Hunter and Peter Kraft, NEJM, 2007
  • 53. Common claims discussed in regards to GWAS: Despite issues, yielded many discoveries vs. cost to a doubling of the number of associated variants discov- ered. The proportion of genetic variation explained by significantly associated SNPs is usually low (typically less than 10%) for many complex traits, but for diseases such as CD and multiple sclerosis (MS [MIM 126200]), and for quantitative traits such as height and lipid traits, between Figure 1. GWAS Discoveries over Time Data obtained from the Published GWAS Catalog (see Web Resources). Only the top SNPs representing loci with association p values < 5 3 10À8 are included, and so that multiple counting is avoided, SNPs identified for the same traits with LD r2 > 0.8 esti- mated from the entire HapMap samples are excluded. ~500,000 SNP chips x ~$500/chip = $250M Five years of GWAS Discovery (Visscher, 2012) $250M / ~2000 loci = $125K/locus Candidate genes: >$250M! 100 NIH R01s Fighter jet Hadron Collider: $9B
  • 54. P = G + EType 2 Diabetes Cancer Alzheimer’s Gene expression Phenotype Genome Variants Environment Infectious agents Nutrients Pollutants Drugs Complex traits are a function of genes and environment...
  • 55. Nothing comparable to elucidate E influence! We lack high-throughput methods and data to discover new E in P… E: ???
  • 56. A similar paradigm for discovery should exist for E! Why?
  • 57. σ2 P = σ2 G + σ2 E
  • 58. σ2 G σ2 P H2 = Heritability (H2) is the range of phenotypic variability attributed to genetic variability in a population Indicator of the proportion of phenotypic differences attributed to G.
  • 59. Height is an example of a heritable trait: Francis Galton shows how its done (1887) “mid-height of 205 parents described 60% of variability of 928 offspring”