This document summarizes research on identifying genetic variations in the Korean population using single-nucleotide polymorphisms (SNPs). It describes analyzing the population structure based on SNP genotypes, using SNPs to determine kinship, and identifying monozygotic twins using copy number variations (CNVs). It also discusses using SNPs to study physical traits in Koreans and developing ancestry informative markers and a database of genomic variants for Korea.
4. Objectives
• Basic study of Korean population stratification
• Evidence of gene flow between Korean and neighbor country
• Informative marker of east asian
East Asia
5. 0.1 % difference btw individuals
> 10M SNPs in population
Body mass index
Waist-hip ratio
Height
Blood pressure
Pulse rate
Bone density
10. Korean Data
16
YeonCheon
16
Pyeong
Chang
MW
JeCheon
16 16
Cheonan
average >70 year old
long settlement
Affymetrix 50K Xba
GyeongJu
16 16
GimJe 15 China(Yanbian)
Goryeong UlSan
Japan(Kobe)
16 Korea-Japan
Vietnam
Korean-Vietnam
SW 16
NaJu
SE
Cambodia
Mongol
16 58,960 SNPs
Jeju
11. Quality Control
58,960 SNPs
n = 242 (Korean n = 159)
2
join HapMap CHB join HapMap JPT
autosomal
1 54,794 SNPs
n=367(230+137)
26,189 SNPs
n=480(367+113)
25,796 SNPs
high missing individual
gentoype call rate
(>3%, mind 0.03)
high missing genotype call rate
(>4%, geno 0.04)
low MAF(<0.0.1, maf 0.01)
hardy-weinberg test
(p < 1x10-6, hwe 0.000001)
n = 230(Korean n = 153)
46,559 SNPs
12. Missing genotype individuals
GimJe
GoRyeong
Gyeong
Ju
Before QC 58,960 SNPs Before QC 58,960 SNPs
All Asian Korean
13. SNP Individual QC
Korean 46,559 159 153
China(Yanbian) 46,559 16 16
Japan(Kobe) 46,559 5 2
Korea-Japan 46,559 6 4
Vietnam 46,559 16 16
Korean-Vietnam 46,559 8 8
Cambodia 46,559 16 16
Mongol 46,559 16 15
Total 242 230
Quality Control All Asian
14. Relatedness between the 153
Korean(10 region) Individuals
YeonCheon
PyeongChang
JeCheon
CheonAn GyeongJu
UlSan
GimJe GoRyeong
NaJu
JeJu
PCA analysis using autosomal 46,559 SNP markers (n=153, Korean)
15. LD-based SNP Pruning
Generate subset of SNPs that are in approximate linkage
equilibrium
Sliding window 50 SNPs and calculate LD
2
Select representative SNPs which have low LD(R ≤ 0.2)
50 SNPs 50 SNPs
5 SNPs 5 SNPs 5 SNPs
First Step Second Step
16. PCA using Pruned SNPs
PCA analysis using PCA analysis using pruned
46,559 SNP markers (n=153) 23,290 SNP markers (n=153)
17. Fst of population
Fst(Fixation index): measure of the genetic differentiation(allele
frequency) over subpopulation
Tishoff SA and Kidd KK.(2004). Nature Genetics Suplement 36:S21-S27.
0 ≤ Fst ≤ 0.05: 무시할 정도
Fst ≥ 0.25: 유전적 분화의 정도가 큼
Fst = 1: 완전히 고립
18. Paired Fst values for Korean Population Groups
0 ≤ Fst ≤ 0.05: 무시할 정도
Fst ≥ 0.25: 유전적 분화의 정도가 큼
Fst = 1: 완전히 고립
19. Differences between Korea(9 Region)
and Jeju
SNPs Showing Significant Differences in Genotype Frequencies between Korea and Jeju
a b
SNPs for which P values less than 10-3 are listed
a. p values for the Cochran-Armitage trend test of genotype frequencies
b. The KARE are indicated
20. Substructure of East Asian descent
YanBian
Mongol
Korea
Kobe
Vietnam
Korea-Vietnam
Korea-Japan
Cambodia
PCA analysis using
46,559 SNP markers (n=230)
22. Substructure with HapMap
YanBian Vietnam
Mongol Korea-Vietnam
Jeju Korea-Japan
Kobe Cambodia
JPT-HapMap
CHB-HapMap
PCA analysis using 25,796 SNPs(n = 480) PCA analysis using pruned 8,347 SNPs(n = 480)
23. PCA analysis of East Asian descent
Mongol
Yanbian
Kobe JPT-
Jeju HapMap
CHB-
HapMap
Vietnam
Cambodia
illustration of geographic correspondence of ethnic group
Korea-Vietnam Korea-Japan
locations
25. EAS-AIMs(Ancestry Informative Marker)
Calculate ln value using infocalc
1) All population(KOR, CHB, JPT, MON, CAM): top 300 SNPs
2) Korean and Japanese: top 900 SNPs
3) Korean and Chinese: top 900 SNPs
4) Korean and Vietnam: top 900 SNPs
3,000 East Asian Ancestry Informative Markers
Best performance 1,500 SNP using PCA
26. 3,000 East Asian AIM
List of East Asian Ancestry Informative Markers
a
a. All Asian(Korea, China, Vietnam, Cambodia, Mongol)
In, informativeness for assignment
Ia, informative for ancestry coefficients
ORCA, optimal rate for correct assignment
27. AIM Sets for determining East Asia
PCA analysis using 1500 AIMs PCA analysis using 1500 Random SNPs
32. Identity-by state(IBS) sharing
Exclude individuals from pairs of samples identified as
cryptic first degree relatives(parent-offspring, twins, or
siblings concordant for phenotype) or more distant 2
relationships if clusters were linked by a first-degree
relative(Science, 2007)
Individual 1 A/C G/T A/G A/A G/G
Individual 2 C/C T/T A/G C/C G/G
IBS 1 1 2 0 2
Pair from same population
33. Identical twin
Cryptic First or
degree redundant
relatives samples
autosomal 60,959 SNPs (n=608, unrelated individuals + 5 families)
34. IBS value in Korean large family
삼촌-조카
조부모-손자 형제
36. Twins CNV(Copy Number Variation)
24 families(24 monozygotic twins and their parent or brothers)
Agilent Human CNV Microarray 244K X 2 array
twin
gain
loss
parent
Region: chr1