SlideShare une entreprise Scribd logo
1  sur  44
Télécharger pour lire hors ligne
Perspectives of identifying Korean genetic
variations
Chang Bum Hong
Center for Genome Science KNIH, KCDC
Dec 1, 2009
Contents

• Population Structure Based on SNP Genotypes
• SNP Based Kinship
• Identify Monozygotic Twins using CNV
• SNP Based Physical Traits
Population Structure Based on SNP Genotypes

                      Hello
Objectives
  • Basic study of Korean population stratification
  • Evidence of gene flow between Korean and neighbor country
  • Informative marker of east asian




                       East Asia
0.1 % difference btw individuals
> 10M SNPs in population




                                   Body mass index
                                   Waist-hip ratio
                                   Height
                                   Blood pressure
                                   Pulse rate
                                   Bone density
Confounding in genetic studies
East Asia - Public genotype data
                                         SNP                      Individual   Population
            PASNP                       54,794                       1,928         75
            HGDP a                      2,834~                       1,056         52
           HapMap                      1,481,135                     1,397         11
                   b
            SGVP                        268,667                       292          3
            Korean                      58,625                        159          10
        China(Yanbian)                  58,625                        16           1
         Japan(Kobe)                    58,625                            5        1
         Korea-Japan                    58,625                            6        1
           Vietnam                      58,625                        16           1
       Korean-Vietnam                   58,625                            8        1
          Cambodia                      58,625                        16           1
            Mongol                      58,625                        16           1
  a. Pan-Asian SNP Consortium(http://www4a.biotec.or.th/PASNP)
  b. Singapore Genome Variation Project(http://www.nus-cme.org.sg/SGVP)
HGDP(Human Genome Diversity Project)
PASNP(Pan-Asian SNP Consortium)
Korean Data
                              16
                        YeonCheon


                                              16
                                             Pyeong
                                             Chang




                                                     MW
                                        JeCheon
                              16             16
                              Cheonan
                                                                      average >70 year old
                                                                      long settlement
                                                                      Affymetrix 50K Xba
                                                      GyeongJu
                             16                                  16
                            GimJe          15                             China(Yanbian)
                                          Goryeong      UlSan
                                                                           Japan(Kobe)
                                                                 16        Korea-Japan
                                                                             Vietnam
                                                                          Korean-Vietnam

       SW               16
                     NaJu


                                                         SE
                                                                            Cambodia
                                                                              Mongol




              16                                                            58,960 SNPs
              Jeju
Quality Control

           58,960 SNPs
     n = 242 (Korean n = 159)
                                                      2
                                    join HapMap CHB       join HapMap JPT
           autosomal
 1        54,794 SNPs
                                     n=367(230+137)
                                       26,189 SNPs
                                                           n=480(367+113)
                                                            25,796 SNPs
       high missing individual
          gentoype call rate
          (>3%, mind 0.03)
  high missing genotype call rate
          (>4%, geno 0.04)
     low MAF(<0.0.1, maf 0.01)
        hardy-weinberg test
    (p < 1x10-6, hwe 0.000001)

     n = 230(Korean n = 153)
          46,559 SNPs
Missing genotype individuals
                                              GimJe




                                               GoRyeong
                                  Gyeong
                                    Ju




      Before QC 58,960 SNPs   Before QC 58,960 SNPs
             All Asian               Korean
SNP       Individual   QC
   Korean             46,559        159       153
China(Yanbian)        46,559         16       16
 Japan(Kobe)          46,559         5         2
 Korea-Japan          46,559         6         4
   Vietnam            46,559         16       16
Korean-Vietnam        46,559         8         8
  Cambodia            46,559         16       16
   Mongol             46,559         16       15
    Total                           242       230



               Quality Control All Asian
Relatedness between the 153
Korean(10 region) Individuals
                                                                 YeonCheon
                                                                             PyeongChang



                                                                              JeCheon




                                                                   CheonAn              GyeongJu



                                                                                           UlSan
                                                                   GimJe     GoRyeong




                                                                NaJu




                                                         JeJu




      PCA analysis using autosomal 46,559 SNP markers (n=153, Korean)
LD-based SNP Pruning
 Generate subset of SNPs that are in approximate linkage
 equilibrium
 Sliding window 50 SNPs and calculate LD
                                                2
 Select representative SNPs which have low LD(R ≤ 0.2)


                       50 SNPs                       50 SNPs




             5 SNPs                5 SNPs   5 SNPs




                      First Step                Second Step
PCA using Pruned SNPs




        PCA analysis using        PCA analysis using pruned
    46,559 SNP markers (n=153)   23,290 SNP markers (n=153)
Fst of population
 Fst(Fixation index): measure of the genetic differentiation(allele
 frequency) over subpopulation




 Tishoff SA and Kidd KK.(2004). Nature Genetics Suplement 36:S21-S27.

 0 ≤ Fst ≤ 0.05: 무시할 정도
 Fst ≥ 0.25: 유전적 분화의 정도가 큼
 Fst = 1: 완전히 고립
Paired Fst values for Korean Population Groups




   0 ≤ Fst ≤ 0.05: 무시할 정도
   Fst ≥ 0.25: 유전적 분화의 정도가 큼
   Fst = 1: 완전히 고립
Differences between Korea(9 Region)
and Jeju
   SNPs Showing Significant Differences in Genotype Frequencies between Korea and Jeju
                                                              a        b




   SNPs for which P values less than 10-3 are listed
   a. p values for the Cochran-Armitage trend test of genotype frequencies
   b. The KARE are indicated
Substructure of East Asian descent

                                         YanBian
                                         Mongol
                                         Korea
                                         Kobe
                                         Vietnam
                                         Korea-Vietnam
                                         Korea-Japan
                                         Cambodia




                PCA analysis using
            46,559 SNP markers (n=230)
International HapMap

HapMap 3 Release 3
POP Num_samples Num_SNPs_QC Num_SNPs_QC_poly
-------------------------------------------------------------------
ASW            87           1623986          1543115
CEU           165           1623122         1397814
CHB            137          1626122         1341772
CHD            109          1620198          1311767
GIH           101          1630857          1408904
JPT          113          1634041         1294406
LWK            110         1625159         1526783
MEX            86          1604948           1453054
MKK            184          1611733         1532002
TSI          102          1632607          1419970
YRI          203          1625669          1493761
Substructure with HapMap




                              YanBian        Vietnam
                              Mongol         Korea-Vietnam
                              Jeju           Korea-Japan
                              Kobe           Cambodia
                             JPT-HapMap
                             CHB-HapMap

 PCA analysis using 25,796 SNPs(n = 480)   PCA analysis using pruned 8,347 SNPs(n = 480)
PCA analysis of East Asian descent
                                                                 Mongol


                                                                                     Yanbian




                                                                                            Kobe JPT-
                                                                                     Jeju        HapMap

                                                                            CHB-
                                                                            HapMap



                                                                       Vietnam

                                                            Cambodia
illustration of geographic correspondence of ethnic group
                                                               Korea-Vietnam            Korea-Japan
locations
Relationship between Eigenvector
values and Latitude
                              47.81
                              39.98
                              37.53



         2
        R = 0.8621
        y = 36.65 + 166.33x
                              14.72
EAS-AIMs(Ancestry Informative Marker)

  Calculate ln value using infocalc
   1) All population(KOR, CHB, JPT, MON, CAM): top 300 SNPs
   2) Korean and Japanese: top 900 SNPs
   3) Korean and Chinese: top 900 SNPs
   4) Korean and Vietnam: top 900 SNPs
  3,000 East Asian Ancestry Informative Markers


  Best performance 1,500 SNP using PCA
3,000 East Asian AIM
List of East Asian Ancestry Informative Markers
                                                        a




a. All Asian(Korea, China, Vietnam, Cambodia, Mongol)
In, informativeness for assignment
Ia, informative for ancestry coefficients
ORCA, optimal rate for correct assignment
AIM Sets for determining East Asia




    PCA analysis using 1500 AIMs   PCA analysis using 1500 Random SNPs
KDGV(Korean Database of Genomic Variants)




               http://ksnp.cdc.go.kr
WiKi Based SNP Annotation
A                                                      B




    A, Human Genome Diversity Project. B, SNP information with allele frequency
SNP Based Kinship
Identity-by state(IBS) sharing
 Exclude individuals from pairs of samples identified as
 cryptic first degree relatives(parent-offspring, twins, or
 siblings concordant for phenotype) or more distant  2
 relationships if clusters were linked by a first-degree
 relative(Science, 2007)


   Individual 1       A/C     G/T    A/G     A/A     G/G

   Individual 2       C/C    T/T     A/G     C/C    G/G

   IBS                  1      1       2      0        2

                  Pair from same population
Identical twin
                                Cryptic First         or
                                  degree          redundant
                                 relatives         samples




autosomal 60,959 SNPs (n=608, unrelated individuals + 5 families)
IBS value in Korean large family




                삼촌-조카



                 조부모-손자    형제
Identify Monozygotic Twins
using CNV
Twins CNV(Copy Number Variation)
   24 families(24 monozygotic twins and their parent or brothers)
   Agilent Human CNV Microarray 244K X 2 array


twin
                                                               gain
                                                               loss


parent




                            Region: chr1
Region: chr2
Region: chrX
SNP Based Physical Traits
SNPedia & Promethease




            SNPedia
    http://www.snpedia.com   April 16, 2009 in Seoul
Promethease Report
Pictures of Lilly: 23andMe Contest
Thank you
Questions?

        Hong ChangBum
        Center for genome Science
        NIH, KCDC
        http://cgs.cdc.go.kr
        http://ksnp.cdc.go.kr

Contenu connexe

Plus de Hong ChangBum

Genomics and BigData - case study
Genomics and BigData - case studyGenomics and BigData - case study
Genomics and BigData - case studyHong ChangBum
 
Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...
Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...
Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...Hong ChangBum
 
Galaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo ProtocolGalaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo ProtocolHong ChangBum
 
BioSMACK - Linux Live CD for GWAS
BioSMACK - Linux Live CD for GWASBioSMACK - Linux Live CD for GWAS
BioSMACK - Linux Live CD for GWASHong ChangBum
 
Next-generation genomics: an integrative approach
Next-generation genomics: an integrative approachNext-generation genomics: an integrative approach
Next-generation genomics: an integrative approachHong ChangBum
 
RSS & Bioinformatics
RSS & BioinformaticsRSS & Bioinformatics
RSS & BioinformaticsHong ChangBum
 
Genome Browser based on Google Maps API
Genome Browser based on Google Maps APIGenome Browser based on Google Maps API
Genome Browser based on Google Maps APIHong ChangBum
 
Korean Database of Genomic Variants
Korean Database of Genomic VariantsKorean Database of Genomic Variants
Korean Database of Genomic VariantsHong ChangBum
 
Next Generation bio Research Infra
Next Generation bio Research InfraNext Generation bio Research Infra
Next Generation bio Research InfraHong ChangBum
 
Linux Cluster and Distributed Resource Manager
Linux Cluster and Distributed Resource ManagerLinux Cluster and Distributed Resource Manager
Linux Cluster and Distributed Resource ManagerHong ChangBum
 

Plus de Hong ChangBum (20)

Genomics and BigData - case study
Genomics and BigData - case studyGenomics and BigData - case study
Genomics and BigData - case study
 
Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...
Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...
Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...
 
Galaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo ProtocolGalaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo Protocol
 
BioSMACK - Linux Live CD for GWAS
BioSMACK - Linux Live CD for GWASBioSMACK - Linux Live CD for GWAS
BioSMACK - Linux Live CD for GWAS
 
Next-generation genomics: an integrative approach
Next-generation genomics: an integrative approachNext-generation genomics: an integrative approach
Next-generation genomics: an integrative approach
 
How to genome
How to genomeHow to genome
How to genome
 
RSS & Bioinformatics
RSS & BioinformaticsRSS & Bioinformatics
RSS & Bioinformatics
 
Genome Browser based on Google Maps API
Genome Browser based on Google Maps APIGenome Browser based on Google Maps API
Genome Browser based on Google Maps API
 
Korean Database of Genomic Variants
Korean Database of Genomic VariantsKorean Database of Genomic Variants
Korean Database of Genomic Variants
 
Dt Ccompanieslist
Dt CcompanieslistDt Ccompanieslist
Dt Ccompanieslist
 
DTC Companies List
DTC Companies ListDTC Companies List
DTC Companies List
 
My Project
My ProjectMy Project
My Project
 
Genome Browser
Genome BrowserGenome Browser
Genome Browser
 
GenomeBrowser
GenomeBrowserGenomeBrowser
GenomeBrowser
 
Desire
DesireDesire
Desire
 
Next Generation bio Research Infra
Next Generation bio Research InfraNext Generation bio Research Infra
Next Generation bio Research Infra
 
Cluster Drm
Cluster DrmCluster Drm
Cluster Drm
 
Cluster Drm
Cluster DrmCluster Drm
Cluster Drm
 
Platform Day
Platform DayPlatform Day
Platform Day
 
Linux Cluster and Distributed Resource Manager
Linux Cluster and Distributed Resource ManagerLinux Cluster and Distributed Resource Manager
Linux Cluster and Distributed Resource Manager
 

Dernier

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 

Dernier (20)

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 

Perspectives on identifying genetic variations in the Korean population

  • 1. Perspectives of identifying Korean genetic variations Chang Bum Hong Center for Genome Science KNIH, KCDC Dec 1, 2009
  • 2. Contents • Population Structure Based on SNP Genotypes • SNP Based Kinship • Identify Monozygotic Twins using CNV • SNP Based Physical Traits
  • 3. Population Structure Based on SNP Genotypes Hello
  • 4. Objectives • Basic study of Korean population stratification • Evidence of gene flow between Korean and neighbor country • Informative marker of east asian East Asia
  • 5. 0.1 % difference btw individuals > 10M SNPs in population Body mass index Waist-hip ratio Height Blood pressure Pulse rate Bone density
  • 7. East Asia - Public genotype data SNP Individual Population PASNP 54,794 1,928 75 HGDP a 2,834~ 1,056 52 HapMap 1,481,135 1,397 11 b SGVP 268,667 292 3 Korean 58,625 159 10 China(Yanbian) 58,625 16 1 Japan(Kobe) 58,625 5 1 Korea-Japan 58,625 6 1 Vietnam 58,625 16 1 Korean-Vietnam 58,625 8 1 Cambodia 58,625 16 1 Mongol 58,625 16 1 a. Pan-Asian SNP Consortium(http://www4a.biotec.or.th/PASNP) b. Singapore Genome Variation Project(http://www.nus-cme.org.sg/SGVP)
  • 10. Korean Data 16 YeonCheon 16 Pyeong Chang MW JeCheon 16 16 Cheonan average >70 year old long settlement Affymetrix 50K Xba GyeongJu 16 16 GimJe 15 China(Yanbian) Goryeong UlSan Japan(Kobe) 16 Korea-Japan Vietnam Korean-Vietnam SW 16 NaJu SE Cambodia Mongol 16 58,960 SNPs Jeju
  • 11. Quality Control 58,960 SNPs n = 242 (Korean n = 159) 2 join HapMap CHB join HapMap JPT autosomal 1 54,794 SNPs n=367(230+137) 26,189 SNPs n=480(367+113) 25,796 SNPs high missing individual gentoype call rate (>3%, mind 0.03) high missing genotype call rate (>4%, geno 0.04) low MAF(<0.0.1, maf 0.01) hardy-weinberg test (p < 1x10-6, hwe 0.000001) n = 230(Korean n = 153) 46,559 SNPs
  • 12. Missing genotype individuals GimJe GoRyeong Gyeong Ju Before QC 58,960 SNPs Before QC 58,960 SNPs All Asian Korean
  • 13. SNP Individual QC Korean 46,559 159 153 China(Yanbian) 46,559 16 16 Japan(Kobe) 46,559 5 2 Korea-Japan 46,559 6 4 Vietnam 46,559 16 16 Korean-Vietnam 46,559 8 8 Cambodia 46,559 16 16 Mongol 46,559 16 15 Total 242 230 Quality Control All Asian
  • 14. Relatedness between the 153 Korean(10 region) Individuals YeonCheon PyeongChang JeCheon CheonAn GyeongJu UlSan GimJe GoRyeong NaJu JeJu PCA analysis using autosomal 46,559 SNP markers (n=153, Korean)
  • 15. LD-based SNP Pruning Generate subset of SNPs that are in approximate linkage equilibrium Sliding window 50 SNPs and calculate LD 2 Select representative SNPs which have low LD(R ≤ 0.2) 50 SNPs 50 SNPs 5 SNPs 5 SNPs 5 SNPs First Step Second Step
  • 16. PCA using Pruned SNPs PCA analysis using PCA analysis using pruned 46,559 SNP markers (n=153) 23,290 SNP markers (n=153)
  • 17. Fst of population Fst(Fixation index): measure of the genetic differentiation(allele frequency) over subpopulation Tishoff SA and Kidd KK.(2004). Nature Genetics Suplement 36:S21-S27. 0 ≤ Fst ≤ 0.05: 무시할 정도 Fst ≥ 0.25: 유전적 분화의 정도가 큼 Fst = 1: 완전히 고립
  • 18. Paired Fst values for Korean Population Groups 0 ≤ Fst ≤ 0.05: 무시할 정도 Fst ≥ 0.25: 유전적 분화의 정도가 큼 Fst = 1: 완전히 고립
  • 19. Differences between Korea(9 Region) and Jeju SNPs Showing Significant Differences in Genotype Frequencies between Korea and Jeju a b SNPs for which P values less than 10-3 are listed a. p values for the Cochran-Armitage trend test of genotype frequencies b. The KARE are indicated
  • 20. Substructure of East Asian descent YanBian Mongol Korea Kobe Vietnam Korea-Vietnam Korea-Japan Cambodia PCA analysis using 46,559 SNP markers (n=230)
  • 21. International HapMap HapMap 3 Release 3 POP Num_samples Num_SNPs_QC Num_SNPs_QC_poly ------------------------------------------------------------------- ASW 87 1623986 1543115 CEU 165 1623122 1397814 CHB 137 1626122 1341772 CHD 109 1620198 1311767 GIH 101 1630857 1408904 JPT 113 1634041 1294406 LWK 110 1625159 1526783 MEX 86 1604948 1453054 MKK 184 1611733 1532002 TSI 102 1632607 1419970 YRI 203 1625669 1493761
  • 22. Substructure with HapMap YanBian Vietnam Mongol Korea-Vietnam Jeju Korea-Japan Kobe Cambodia JPT-HapMap CHB-HapMap PCA analysis using 25,796 SNPs(n = 480) PCA analysis using pruned 8,347 SNPs(n = 480)
  • 23. PCA analysis of East Asian descent Mongol Yanbian Kobe JPT- Jeju HapMap CHB- HapMap Vietnam Cambodia illustration of geographic correspondence of ethnic group Korea-Vietnam Korea-Japan locations
  • 24. Relationship between Eigenvector values and Latitude 47.81 39.98 37.53 2 R = 0.8621 y = 36.65 + 166.33x 14.72
  • 25. EAS-AIMs(Ancestry Informative Marker) Calculate ln value using infocalc 1) All population(KOR, CHB, JPT, MON, CAM): top 300 SNPs 2) Korean and Japanese: top 900 SNPs 3) Korean and Chinese: top 900 SNPs 4) Korean and Vietnam: top 900 SNPs 3,000 East Asian Ancestry Informative Markers Best performance 1,500 SNP using PCA
  • 26. 3,000 East Asian AIM List of East Asian Ancestry Informative Markers a a. All Asian(Korea, China, Vietnam, Cambodia, Mongol) In, informativeness for assignment Ia, informative for ancestry coefficients ORCA, optimal rate for correct assignment
  • 27. AIM Sets for determining East Asia PCA analysis using 1500 AIMs PCA analysis using 1500 Random SNPs
  • 28. KDGV(Korean Database of Genomic Variants) http://ksnp.cdc.go.kr
  • 29.
  • 30. WiKi Based SNP Annotation A B A, Human Genome Diversity Project. B, SNP information with allele frequency
  • 32. Identity-by state(IBS) sharing Exclude individuals from pairs of samples identified as cryptic first degree relatives(parent-offspring, twins, or siblings concordant for phenotype) or more distant 2 relationships if clusters were linked by a first-degree relative(Science, 2007) Individual 1 A/C G/T A/G A/A G/G Individual 2 C/C T/T A/G C/C G/G IBS 1 1 2 0 2 Pair from same population
  • 33. Identical twin Cryptic First or degree redundant relatives samples autosomal 60,959 SNPs (n=608, unrelated individuals + 5 families)
  • 34. IBS value in Korean large family 삼촌-조카 조부모-손자 형제
  • 36. Twins CNV(Copy Number Variation) 24 families(24 monozygotic twins and their parent or brothers) Agilent Human CNV Microarray 244K X 2 array twin gain loss parent Region: chr1
  • 40. SNPedia & Promethease SNPedia http://www.snpedia.com April 16, 2009 in Seoul
  • 42. Pictures of Lilly: 23andMe Contest
  • 44. Questions? Hong ChangBum Center for genome Science NIH, KCDC http://cgs.cdc.go.kr http://ksnp.cdc.go.kr