SlideShare une entreprise Scribd logo
1  sur  39
Télécharger pour lire hors ligne
 

When	
  Bayes	
  meets	
  Darwin:	
  a	
  
journey	
  in	
  popula6on	
  genomics	
  	
  
	
  

michael.blum@imag.fr	
  
Laboratoire	
  TIMC-­‐IMAG,	
  Grenoble	
  
	
  
In	
   the	
   “descent	
   of	
   man”,	
   Darwin	
  
concluded	
  that	
  the	
  visual	
  differences	
  	
  
between	
   human	
   popula6on	
   were	
   not	
  
adap6ve	
  to	
  any	
  significant	
  degree	
  […]	
  

“Natural	
  selec,on	
  has	
  almost	
  become	
  
irrelevant	
   in	
   human	
   evolu,on.	
   There's	
  
been	
   no	
   biological	
   change	
   in	
   humans	
  
in	
  40,000	
  or	
  50,000	
  years”	
  	
  
Stephen	
  J.	
  Gould	
  
But	
  here	
  is	
  a	
  counter-­‐example	
  
•  Tibetan	
   popula6ons	
   got	
   adapted	
   to	
   their	
   high-­‐al6tude	
   and	
  
low-­‐oxygen	
   environment	
   thanks	
   to	
   increased	
   respiratory	
  
rate	
  and	
  increased	
  blood	
  flow.	
  
•  These	
   traits	
   are	
   transmiTed	
   from	
   genera6on	
   to	
  
genera6on.	
  
•  Tibetan	
  plateau	
  has	
  been	
  inhabited	
  since	
  ~	
  20,000	
  years.	
  
Local	
  adapta6on	
  
•  Human	
   adapta6on	
   to	
   high-­‐al6tude	
   is	
   an	
   instance	
   of	
   local	
  
adapta6on.	
  
•  Understanding	
   how	
   individuals	
   adapt	
   to	
   their	
   local	
  
environment	
   is	
   central	
   in	
   biology.	
   Plants	
   adapt	
   to	
   their	
  
environment,	
  bacteria	
  adapt	
  to	
  an6bio6cs…	
  
•  Defini6on	
   of	
   local	
   adapta6on:	
   greater	
   fitness	
   (a	
   measure	
   of	
  
reproduc6ve	
  fitness)	
  of	
  individuals	
  in	
  their	
  local	
  habitats	
  due	
  
to	
  natural	
  selec6on.	
  

How	
  to	
  find	
  genomic	
  regions	
  
involved	
  in	
  local	
  adapta6on?	
  
Data	
  descrip6on	
  
Single	
  Nucleo6de	
  Polymorphism	
  (SNP)	
  
Indiv	
  1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ....ACCCG……….	
  
	
   	
   	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ....AACCG………. 	
  	
  
Number	
  of	
  copy 	
   	
   	
   	
   	
  	
  1	
  	
  	
  	
  	
  0	
  
Indiv	
  2	
  	
  	
  	
   	
   	
   	
   	
   	
   	
  ….ACCCT……….	
  
	
   	
   	
   	
   	
   	
   	
   	
  	
  	
  	
  	
  	
  	
  ….ACCCT……….	
  	
  	
  
Number	
  of	
  copy 	
   	
   	
   	
   	
  	
  0	
  	
  	
  	
  	
  2	
  
-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  
•  3	
  billion	
  base	
  pairs	
  in	
  the	
  human	
  genome	
  
•  Commercial	
  SNP	
  chips,	
  100€	
  for	
  500,000	
  SNPs	
  
•  dbSNP	
  >106	
  SNPS	
  
	
  
Single	
  Nucleo6de	
  Polymorphism	
  (SNP)	
  
Data	
  matrix	
  Y	
  
	
  
Locus	
  1	
  	
  

Locus	
  2	
  	
  

Locus	
  3	
  	
  

Indiv	
  1	
  

1	
  

0	
  

2	
  

Indiv	
  2	
  

0	
  

2	
  

0	
  

Indiv	
  3	
  

0	
  

0	
  

0	
  

Indiv	
  4	
  

0	
  

1	
  

1	
  

Indiv	
  5	
  

1	
  

1	
  

1	
  
Main	
  principle	
  of	
  popula6on	
  genomics	
  
	
  
•  Genome-­‐wide	
  paTerns	
  are	
  influenced	
  by	
  
neutral	
  processes.	
  
Migra6on,	
  admixture,	
  expansion	
  

•  Genes	
  involved	
  in	
  local	
  adapta6on	
  are	
  
outliers.	
  
	
  
	
  
Adapta6on	
  to	
  al6tude	
  
Manha?an	
  plot	
  

Xu	
  et	
  al.	
  MBE	
  2011	
  
Human	
  HGDP	
  data	
  
Genome-­‐wide	
  paTerns	
  
30

Principal	
  component	
  analysis	
  
Africa
America
Oceania

10

Asia

0

PC2

20

Middle-East
Europe
East Asia

0

10

PC1

20
Principal	
  component	
  analysis	
  

Novembre	
  et	
  al.	
  
Nature	
  2008	
  
Genome	
  scan	
  for	
  local	
  adapta6on:	
  a	
  
Bayesian	
  PCA	
  approach	
  
 
Singular	
  Value	
  Decomposi6on	
  (SVD)	
  
viewpoint	
  of	
  PCA	
  
	
  
In	
  matrix	
  nota6on,	
  we	
  have	
  

Y = UV,
where	
  Y	
  is	
  the	
  genotype	
  (n,p)	
  matrix,	
  U	
  is	
  the	
  (n,K)	
  
score	
  matrix	
  and	
  V	
  is	
  the	
  loadings	
  (K,p)	
  matrix.	
  
	
  
Varia6ons	
  around	
  SVD	
  in	
  machine	
  learning	
  
matrix	
  factoriza,on,	
  low-­‐rank	
  approxima,on,	
  
probabilis,c	
  PCA,	
  factor	
  analysis,…	
  
 
Singular	
  Value	
  Decomposi6on	
  	
  (SVD)	
  
viewpoint	
  of	
  PCA	
  
	
  

An	
  op6mal	
  approxima6on	
  of	
  rank	
  K	
  for	
  the	
  matrix	
  of	
  
genotypes	
  Y	
  	
  	
  
K

Yi = ∑ u V
k
i

k

k=1

Yi:	
  Genotype	
  of	
  the	
  ith	
  individual	
  
(0,1,1,2,0,0,…..)	
  

k,1
k,2
k,3
Vk:	
  vector	
  of	
  loadings	
  	
  (v , v , v ,...)
of	
  the	
  same	
  length	
  as	
  Yi	
  
Bayesian	
  principal	
  	
  
component	
  analysis	
  

	
  
•  A	
  probabilis6c	
  version	
  of	
  PCA	
  	
  	
  
	
  	
  	
  	
  	
  Tipping	
  and	
  Bishop	
  1999	
  
K

Yi = ∑ u V + εi .
k
i

k

k=1

•  The	
  variance-­‐infla6on	
  model	
  for	
  outlier	
  detec6on	
  
	
  	
  	
  	
  	
  Box	
  and	
  Tiao	
  1968	
  	
  

p(v j ) = (1− π ) Ν(0,σ 2 ) + π Ν(0,c 2σ 2 ),
where	
  π	
  is	
  the	
  genome-­‐wide	
  outlier	
  probability,	
  
and	
  the	
  prior	
  for	
  c2	
  is	
  uniform(1,c2max).	
  
Accoun6ng	
  for	
  local	
  correla6on	
  in	
  the	
  
genome	
  
Local	
  correla6on	
  because	
  of	
  recombina6on	
  

Ising	
  model	
  (Outlier	
  Zj=1,	
  non-­‐outlier	
  Zj=0)	
  
P(Z j = 1) ∝ π exp(β.∑ Z k ),

where	
  β>0	
  is	
  an	
  hyperparameter.	
  	
  

k ~j
A	
  hierarchical	
  Bayesian	
  model	
  
Gibbs	
  sampler	
  for	
  sampling	
  the	
  posterior	
  
π	
  

β	
  
σ	
  

Z	
  

K	
  
U	
  

V	
  
Y	
  

c	
  
σ0	
  

cmax	
  
Low-­‐rank	
  approxima6on	
  for	
  outlier	
  
detec6on	
  in	
  video	
  sequences	
  
Bayesian	
  scores	
  for	
  detec6ng	
  outliers	
  
•  Bayes	
  factors:	
  a	
  Bayesian	
  alterna6ve	
  to	
  P-­‐values	
  

BF = P(Y j outlier) / P(Y j non − outlier)
•  Posterior	
  odds	
  

P(outlier Y j ) / P(non − outlier Y j ) = prior.odds * BF
•  For	
  any	
  list	
  of	
  outlier	
  SNPs,	
  a	
  false	
  discovery	
  rate	
  
can	
  be	
  es6mated	
  based	
  on	
  posterior	
  odds.	
  
Ex	
  1:	
  a	
  simula6on	
  study	
  in	
  a	
  	
  
divergence	
  model	
  
	
  
Neutral	
  divergence	
  (ms)	
  
Divergence	
  with	
  selec6on	
  (SimuPOP)	
  
4%	
  out	
  of	
  10,000	
  SNPs	
  under	
  selec6on	
  	
  
Other	
  methods	
  for	
  genome	
  scan	
  of	
  
local	
  adapta6on	
  
•  Fst	
  	
  A	
  	
  measure	
  of	
  differen6a6on	
  between	
  popula6ons	
  	
  
•  BayeScan	
  (Foll	
  and	
  Gaggios	
  2008)	
  
•  Both	
  methods	
  assume	
  (implicitely	
  or	
  explicitely)	
  a	
  mechanis6c	
  
model	
  of	
  instantaneous	
  divergence	
  
Popula6on	
  structure	
  

PC2

Neutral	
  

Adap6ve	
  
Selec6on	
  scan	
  

0

2

log10(BF)
4
6

8

PC 1
PC 2
PC 3

0

2000

4000

6000

SNP

8000

10000
Comparing	
  methods	
  of	
  	
  
selec6on	
  scan	
  

0.6

Advantage	
  of	
  non-­‐parametric	
  methods	
  in	
  data-­‐rich	
  situa6ons	
  
	
  
BayeScan
Fst

0.1

0.2

0.3

0.4

T	
  

0.0

False discovery rate

0.5

PCAdapt

0.01

0.02

0.03

Divergence time

0.04

0.05
Ex	
  2:	
  a	
  spa6ally-­‐explicit	
  	
  
simula6on	
  
with	
  a	
  gradient	
  of	
  selec6on	
  
0.5

0.5

1.5

2

1
0.5

0

0.
5
Popula6on	
  structure	
  
PC 1

PC 2

PC 3
0.5

0.5

1

0

1.5

1

0.5

0.5

1.5

0
1

1

2

1.5
1
0.5

0

0.

5
Selec6on	
  scan	
  

150
100
50
0

log10(BF)

200

250

PC 1
PC 2
PC 3

0

500

1000

SNP

1500

2000
30

Applica6on	
  to	
  the	
  human	
  	
  
HGDP	
  data	
  
Africa
Americas
Oceania

10

Asia

0

PC2

20

Middle-East
Europe
East Asia

0

10

PC1

20
ManhaTan	
  plot	
  
	
  

Top	
  hit	
  is	
  in	
  chromosome	
  16	
  
4

ABCC11
PC2
PC3

0

2

3

PC4

0e+00

2e+07

4e+07

6e+07

Physical position

8e+07
Geographic	
  distribu6on	
  of	
  the	
  top-­‐SNP	
  	
  

Involved	
  in	
  earwax	
  type	
  
(cerumen)	
  and	
  transpira6on	
  
Enrichment	
  analysis	
  
30

Are	
  PC2	
  outliers	
  enriched	
  for	
  genes	
  involved	
  in	
  immunity?	
  
Africa
Americas
Oceania

10

Asia

0

PC2

20

Middle-East
Europe
East Asia

0

10

PC1

20
Big	
  data	
  
What	
  can	
  you	
  do	
  with	
  
millions	
  of	
  SNPs?	
  
Scalable	
  Bayesian	
  
computa6on?	
  
Standard	
  PCA	
  
and	
  permuta6on	
  tests.	
  
A	
  George	
  Box	
  (1919-­‐2013)	
  story	
  to	
  
conclude	
  

•  Box	
  wanted	
  to	
  write	
  a	
  paper	
  with	
  Cox	
  because	
  having	
  a	
  Box	
  
and	
  Cox	
  paper	
  would	
  be	
  fun.	
  
•  They	
  decided	
  to	
  write	
  a	
  paper	
  on	
  transforma6on.	
  
•  One	
   author	
   wrote	
   the	
   Bayesian	
   version	
   and	
   the	
   other	
   one	
  
wrote	
  the	
  maximum	
  likelihood	
  version.	
  We	
  do	
  not	
  know	
  who	
  
wrote	
  what.	
  
•  At	
  the	
  end,	
  it	
  did	
  not	
  make	
  much	
  prac6cal	
  difference.	
  
Nicolas	
  Duforet-­‐Frebourg	
  
Spa6al	
  autocorrela6on	
  explains	
  the	
  
PCA	
  paTern	
  
0.160

0.165

0.170

0.175

0.180

Mean squared error
0.185

Choice	
  of	
  K	
  

2
4
6
8

K
10
12
1.0

Robustness	
  w.r.t.	
  the	
  
	
  choice	
  of	
  K	
  

0.6
0.2

0.4

K=2

K>2

0.0

False discovery rate

0.8

K=1

0.01

0.02

0.03

Divergence time

0.04

0.05

Contenu connexe

En vedette

En vedette (8)

DoThisApp
DoThisAppDoThisApp
DoThisApp
 
Mobile Commerce- POS
Mobile Commerce- POSMobile Commerce- POS
Mobile Commerce- POS
 
Chikhi grenoble bioinfo_biodiv_juin_2011
Chikhi grenoble bioinfo_biodiv_juin_2011Chikhi grenoble bioinfo_biodiv_juin_2011
Chikhi grenoble bioinfo_biodiv_juin_2011
 
Robert
RobertRobert
Robert
 
Aussem
AussemAussem
Aussem
 
Ibm worklight
Ibm worklightIbm worklight
Ibm worklight
 
IBM Worklight- Checkout Process Architecture
IBM Worklight- Checkout Process ArchitectureIBM Worklight- Checkout Process Architecture
IBM Worklight- Checkout Process Architecture
 
Daunizeau
DaunizeauDaunizeau
Daunizeau
 

Similaire à Blum

OpenCL applications in genomics
OpenCL applications in genomicsOpenCL applications in genomics
OpenCL applications in genomicsUSC
 
132 gene expression in atherosclerotic plaques
132 gene expression in atherosclerotic plaques132 gene expression in atherosclerotic plaques
132 gene expression in atherosclerotic plaquesSHAPE Society
 
Genetic Modulation Of Aii Amacrine Cell & Type
Genetic Modulation Of Aii Amacrine Cell & TypeGenetic Modulation Of Aii Amacrine Cell & Type
Genetic Modulation Of Aii Amacrine Cell & Typestephaniealeong
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Elia Brodsky
 
Zymoseptoria Community meeting Kiel 2017 - Daniel Croll
Zymoseptoria Community meeting Kiel 2017 - Daniel CrollZymoseptoria Community meeting Kiel 2017 - Daniel Croll
Zymoseptoria Community meeting Kiel 2017 - Daniel CrollZymoseptoria Community
 
Genomic analysis of a hypertensive qtl on rat chr 1
Genomic analysis of a hypertensive qtl on rat chr 1Genomic analysis of a hypertensive qtl on rat chr 1
Genomic analysis of a hypertensive qtl on rat chr 1Laurence Dawkins-Hall
 
Evgeny nikolaev proteomics of body liquids as a source for potential methods ...
Evgeny nikolaev proteomics of body liquids as a source for potential methods ...Evgeny nikolaev proteomics of body liquids as a source for potential methods ...
Evgeny nikolaev proteomics of body liquids as a source for potential methods ...igorod
 
Sept2016 plenary mercer_sequins
Sept2016 plenary mercer_sequinsSept2016 plenary mercer_sequins
Sept2016 plenary mercer_sequinsGenomeInABottle
 
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...Fatma Sayed Ibrahim
 
Anis2 Gp Tonini
Anis2   Gp ToniniAnis2   Gp Tonini
Anis2 Gp ToniniATkoala
 
Project Presentation
Project PresentationProject Presentation
Project Presentationbutest
 

Similaire à Blum (20)

20140710 6 c_mason_ercc2.0_workshop
20140710 6 c_mason_ercc2.0_workshop20140710 6 c_mason_ercc2.0_workshop
20140710 6 c_mason_ercc2.0_workshop
 
OpenCL applications in genomics
OpenCL applications in genomicsOpenCL applications in genomics
OpenCL applications in genomics
 
132 gene expression in atherosclerotic plaques
132 gene expression in atherosclerotic plaques132 gene expression in atherosclerotic plaques
132 gene expression in atherosclerotic plaques
 
132 gene expression in atherosclerotic plaques
132 gene expression in atherosclerotic plaques132 gene expression in atherosclerotic plaques
132 gene expression in atherosclerotic plaques
 
Micro array study for gene expression in vp
Micro array study for gene expression in vpMicro array study for gene expression in vp
Micro array study for gene expression in vp
 
Genetic Modulation Of Aii Amacrine Cell & Type
Genetic Modulation Of Aii Amacrine Cell & TypeGenetic Modulation Of Aii Amacrine Cell & Type
Genetic Modulation Of Aii Amacrine Cell & Type
 
Genetic mapping
Genetic mappingGenetic mapping
Genetic mapping
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
 
Mapping biodiversity and biomass in Sulawesi Indonesia
Mapping biodiversity and biomass in Sulawesi IndonesiaMapping biodiversity and biomass in Sulawesi Indonesia
Mapping biodiversity and biomass in Sulawesi Indonesia
 
Zymoseptoria Community meeting Kiel 2017 - Daniel Croll
Zymoseptoria Community meeting Kiel 2017 - Daniel CrollZymoseptoria Community meeting Kiel 2017 - Daniel Croll
Zymoseptoria Community meeting Kiel 2017 - Daniel Croll
 
Bioinformatics life sciences_v2015
Bioinformatics life sciences_v2015Bioinformatics life sciences_v2015
Bioinformatics life sciences_v2015
 
Genomic analysis of a hypertensive qtl on rat chr 1
Genomic analysis of a hypertensive qtl on rat chr 1Genomic analysis of a hypertensive qtl on rat chr 1
Genomic analysis of a hypertensive qtl on rat chr 1
 
Evgeny nikolaev proteomics of body liquids as a source for potential methods ...
Evgeny nikolaev proteomics of body liquids as a source for potential methods ...Evgeny nikolaev proteomics of body liquids as a source for potential methods ...
Evgeny nikolaev proteomics of body liquids as a source for potential methods ...
 
Vivo vitrothingamajig
Vivo vitrothingamajigVivo vitrothingamajig
Vivo vitrothingamajig
 
Sept2016 plenary mercer_sequins
Sept2016 plenary mercer_sequinsSept2016 plenary mercer_sequins
Sept2016 plenary mercer_sequins
 
ACMG Workshop 2011
ACMG Workshop 2011ACMG Workshop 2011
ACMG Workshop 2011
 
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...
 
Anis2 Gp Tonini
Anis2   Gp ToniniAnis2   Gp Tonini
Anis2 Gp Tonini
 
7 0
7 07 0
7 0
 
Project Presentation
Project PresentationProject Presentation
Project Presentation
 

Dernier

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Dernier (20)

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

Blum

  • 1.   When  Bayes  meets  Darwin:  a   journey  in  popula6on  genomics       michael.blum@imag.fr   Laboratoire  TIMC-­‐IMAG,  Grenoble    
  • 2. In   the   “descent   of   man”,   Darwin   concluded  that  the  visual  differences     between   human   popula6on   were   not   adap6ve  to  any  significant  degree  […]   “Natural  selec,on  has  almost  become   irrelevant   in   human   evolu,on.   There's   been   no   biological   change   in   humans   in  40,000  or  50,000  years”     Stephen  J.  Gould  
  • 3. But  here  is  a  counter-­‐example   •  Tibetan   popula6ons   got   adapted   to   their   high-­‐al6tude   and   low-­‐oxygen   environment   thanks   to   increased   respiratory   rate  and  increased  blood  flow.   •  These   traits   are   transmiTed   from   genera6on   to   genera6on.   •  Tibetan  plateau  has  been  inhabited  since  ~  20,000  years.  
  • 4. Local  adapta6on   •  Human   adapta6on   to   high-­‐al6tude   is   an   instance   of   local   adapta6on.   •  Understanding   how   individuals   adapt   to   their   local   environment   is   central   in   biology.   Plants   adapt   to   their   environment,  bacteria  adapt  to  an6bio6cs…   •  Defini6on   of   local   adapta6on:   greater   fitness   (a   measure   of   reproduc6ve  fitness)  of  individuals  in  their  local  habitats  due   to  natural  selec6on.   How  to  find  genomic  regions   involved  in  local  adapta6on?  
  • 6. Single  Nucleo6de  Polymorphism  (SNP)   Indiv  1                                                                          ....ACCCG……….                                                                                  ....AACCG……….     Number  of  copy            1          0   Indiv  2                    ….ACCCT……….                              ….ACCCT……….       Number  of  copy            0          2   -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐   •  3  billion  base  pairs  in  the  human  genome   •  Commercial  SNP  chips,  100€  for  500,000  SNPs   •  dbSNP  >106  SNPS    
  • 7. Single  Nucleo6de  Polymorphism  (SNP)   Data  matrix  Y     Locus  1     Locus  2     Locus  3     Indiv  1   1   0   2   Indiv  2   0   2   0   Indiv  3   0   0   0   Indiv  4   0   1   1   Indiv  5   1   1   1  
  • 8. Main  principle  of  popula6on  genomics     •  Genome-­‐wide  paTerns  are  influenced  by   neutral  processes.   Migra6on,  admixture,  expansion   •  Genes  involved  in  local  adapta6on  are   outliers.      
  • 9. Adapta6on  to  al6tude   Manha?an  plot   Xu  et  al.  MBE  2011  
  • 12. 30 Principal  component  analysis   Africa America Oceania 10 Asia 0 PC2 20 Middle-East Europe East Asia 0 10 PC1 20
  • 13. Principal  component  analysis   Novembre  et  al.   Nature  2008  
  • 14. Genome  scan  for  local  adapta6on:  a   Bayesian  PCA  approach  
  • 15.   Singular  Value  Decomposi6on  (SVD)   viewpoint  of  PCA     In  matrix  nota6on,  we  have   Y = UV, where  Y  is  the  genotype  (n,p)  matrix,  U  is  the  (n,K)   score  matrix  and  V  is  the  loadings  (K,p)  matrix.     Varia6ons  around  SVD  in  machine  learning   matrix  factoriza,on,  low-­‐rank  approxima,on,   probabilis,c  PCA,  factor  analysis,…  
  • 16.   Singular  Value  Decomposi6on    (SVD)   viewpoint  of  PCA     An  op6mal  approxima6on  of  rank  K  for  the  matrix  of   genotypes  Y       K Yi = ∑ u V k i k k=1 Yi:  Genotype  of  the  ith  individual   (0,1,1,2,0,0,…..)   k,1 k,2 k,3 Vk:  vector  of  loadings    (v , v , v ,...) of  the  same  length  as  Yi  
  • 17. Bayesian  principal     component  analysis     •  A  probabilis6c  version  of  PCA                Tipping  and  Bishop  1999   K Yi = ∑ u V + εi . k i k k=1 •  The  variance-­‐infla6on  model  for  outlier  detec6on            Box  and  Tiao  1968     p(v j ) = (1− π ) Ν(0,σ 2 ) + π Ν(0,c 2σ 2 ), where  π  is  the  genome-­‐wide  outlier  probability,   and  the  prior  for  c2  is  uniform(1,c2max).  
  • 18. Accoun6ng  for  local  correla6on  in  the   genome   Local  correla6on  because  of  recombina6on   Ising  model  (Outlier  Zj=1,  non-­‐outlier  Zj=0)   P(Z j = 1) ∝ π exp(β.∑ Z k ), where  β>0  is  an  hyperparameter.     k ~j
  • 19. A  hierarchical  Bayesian  model   Gibbs  sampler  for  sampling  the  posterior   π   β   σ   Z   K   U   V   Y   c   σ0   cmax  
  • 20. Low-­‐rank  approxima6on  for  outlier   detec6on  in  video  sequences  
  • 21. Bayesian  scores  for  detec6ng  outliers   •  Bayes  factors:  a  Bayesian  alterna6ve  to  P-­‐values   BF = P(Y j outlier) / P(Y j non − outlier) •  Posterior  odds   P(outlier Y j ) / P(non − outlier Y j ) = prior.odds * BF •  For  any  list  of  outlier  SNPs,  a  false  discovery  rate   can  be  es6mated  based  on  posterior  odds.  
  • 22. Ex  1:  a  simula6on  study  in  a     divergence  model     Neutral  divergence  (ms)   Divergence  with  selec6on  (SimuPOP)   4%  out  of  10,000  SNPs  under  selec6on    
  • 23. Other  methods  for  genome  scan  of   local  adapta6on   •  Fst    A    measure  of  differen6a6on  between  popula6ons     •  BayeScan  (Foll  and  Gaggios  2008)   •  Both  methods  assume  (implicitely  or  explicitely)  a  mechanis6c   model  of  instantaneous  divergence  
  • 25. Selec6on  scan   0 2 log10(BF) 4 6 8 PC 1 PC 2 PC 3 0 2000 4000 6000 SNP 8000 10000
  • 26. Comparing  methods  of     selec6on  scan   0.6 Advantage  of  non-­‐parametric  methods  in  data-­‐rich  situa6ons     BayeScan Fst 0.1 0.2 0.3 0.4 T   0.0 False discovery rate 0.5 PCAdapt 0.01 0.02 0.03 Divergence time 0.04 0.05
  • 27. Ex  2:  a  spa6ally-­‐explicit     simula6on   with  a  gradient  of  selec6on   0.5 0.5 1.5 2 1 0.5 0 0. 5
  • 28. Popula6on  structure   PC 1 PC 2 PC 3 0.5 0.5 1 0 1.5 1 0.5 0.5 1.5 0 1 1 2 1.5 1 0.5 0 0. 5
  • 29. Selec6on  scan   150 100 50 0 log10(BF) 200 250 PC 1 PC 2 PC 3 0 500 1000 SNP 1500 2000
  • 30. 30 Applica6on  to  the  human     HGDP  data   Africa Americas Oceania 10 Asia 0 PC2 20 Middle-East Europe East Asia 0 10 PC1 20
  • 31. ManhaTan  plot     Top  hit  is  in  chromosome  16   4 ABCC11 PC2 PC3 0 2 3 PC4 0e+00 2e+07 4e+07 6e+07 Physical position 8e+07
  • 32. Geographic  distribu6on  of  the  top-­‐SNP     Involved  in  earwax  type   (cerumen)  and  transpira6on  
  • 33. Enrichment  analysis   30 Are  PC2  outliers  enriched  for  genes  involved  in  immunity?   Africa Americas Oceania 10 Asia 0 PC2 20 Middle-East Europe East Asia 0 10 PC1 20
  • 34. Big  data   What  can  you  do  with   millions  of  SNPs?   Scalable  Bayesian   computa6on?   Standard  PCA   and  permuta6on  tests.  
  • 35. A  George  Box  (1919-­‐2013)  story  to   conclude   •  Box  wanted  to  write  a  paper  with  Cox  because  having  a  Box   and  Cox  paper  would  be  fun.   •  They  decided  to  write  a  paper  on  transforma6on.   •  One   author   wrote   the   Bayesian   version   and   the   other   one   wrote  the  maximum  likelihood  version.  We  do  not  know  who   wrote  what.   •  At  the  end,  it  did  not  make  much  prac6cal  difference.  
  • 37. Spa6al  autocorrela6on  explains  the   PCA  paTern  
  • 39. 1.0 Robustness  w.r.t.  the    choice  of  K   0.6 0.2 0.4 K=2 K>2 0.0 False discovery rate 0.8 K=1 0.01 0.02 0.03 Divergence time 0.04 0.05