SlideShare a Scribd company logo
1 of 53
Download to read offline
Big	
  Data	
  in	
  Biomedicine:	
  
Transla3ng	
  300	
  trillion	
  points	
  of	
  data	
  
into	
  new	
  drugs	
  and	
  diagnos3cs	
  	
  	
  
Atul	
  Bu;e,	
  MD,	
  PhD	
  
Chief,	
  Division	
  of	
  Systems	
  Medicine,	
  	
  
Departments	
  of	
  Pediatrics,	
  Gene3cs,	
  	
  
and,	
  by	
  courtesy,	
  Computer	
  Science,	
  
Pathology,	
  and	
  Medicine	
  
Center	
  for	
  Pediatric	
  Bioinforma3cs,	
  LPCH	
  
Stanford	
  University	
  
abu;e@stanford.edu	
  	
  
@atulbu;e	
  
@ImmPortDB	
  
Disclosures	
  
•  Scien'fic	
  founder	
  and	
  	
  
advisory	
  board	
  membership	
  
–  Genstruct	
  
–  NuMedii	
  
–  Personalis	
  
–  Carmenta	
  
•  Honoraria	
  for	
  talks	
  
–  Lilly	
  
–  Pfizer	
  
–  Siemens	
  
–  Bristol	
  Myers	
  Squibb	
  
–  AstraZeneca	
  
–  Roche	
  
–  Genentech	
  
•  Past	
  or	
  present	
  consultancy	
  
–  Lilly	
  
–  Johnson	
  and	
  Johnson	
  
–  Roche	
  
–  NuMedii	
  
–  Genstruct	
  
–  Tercica	
  
–  Ecoeos	
  
–  Ansh	
  Labs	
  
–  Prevendia	
  
–  Samsung	
  
–  Assay	
  Depot	
  
–  Regeneron	
  
–  Verinata	
  
–  Geisinger	
  
–  Covance	
  
•  Corporate	
  Rela'onships	
  
–  Northrop	
  Grumman	
  
–  Aptalis	
  
–  Thomson	
  Reuters	
  
•  Speakers’	
  bureau	
  
–  None	
  
•  Companies	
  started	
  by	
  students	
  
–  Carmenta	
  
–  Serendipity	
  
–  NuMedii	
  
–  S'mulomics	
  
–  NunaHealth	
  
–  Praedicat	
  
–  MyTime	
  
–  Flipora	
  
	
  
Big	
  Data	
  in	
  	
  
Biomedicine	
  
Nearly	
  1.4	
  million	
  microarrays	
  available	
  
Doubles	
  every	
  2-­‐3	
  years	
  
Bu;e	
  AJ.	
  Transla3onal	
  Bioinforma3cs:	
  	
  
coming	
  of	
  age.	
  JAMIA,	
  2008.	
  
127	
  million	
  substances	
  x	
  
740,000	
  assays	
  
	
  
1.2	
  billion	
  points	
  of	
  data	
  
within	
  a	
  grid	
  of	
  	
  
100	
  trillion	
  cells	
  
	
  
~250	
  million	
  ac3ve	
  
substances	
  
5,178	
  compounds	
  
·∙	
  	
  	
  	
  	
  	
  	
  	
  1,300	
  off-­‐patent	
  FDA-­‐approved	
  drugs	
  
·∙	
  	
  	
  	
  	
  	
  	
  	
  700	
  bioac've	
  tool	
  compounds	
  
·∙	
  	
  	
  	
  	
  	
  	
  	
  2,000+	
  screening	
  hits	
  (MLPCN	
  and	
  others)	
  
3,712	
  genes	
  (shRNA	
  +	
  cDNA)	
  
·∙	
  	
  	
  	
  	
  	
  	
  	
  targets/pathways	
  of	
  FDA-­‐approved	
  drugs	
  (n=900)	
  
·∙	
  	
  	
  	
  	
  	
  	
  	
  candidate	
  disease	
  genes	
  (n=600)	
  
·∙	
  	
  	
  	
  	
  	
  	
  	
  community	
  nomina'ons	
  (n=500+)	
  
15	
  cell	
  types	
  
·∙	
  	
  	
  	
  	
  	
  	
  	
  Banked	
  primary	
  cell	
  types	
  
·∙	
  	
  	
  	
  	
  	
  	
  	
  Cancer	
  cell	
  lines	
  
·∙	
  	
  	
  	
  	
  	
  	
  	
  Primary	
  hTERT	
  immortalized	
  
·∙	
  	
  	
  	
  	
  	
  	
  	
  Pa'ent	
  derived	
  iPS	
  cells	
  
·∙	
  	
  	
  	
  	
  	
  	
  	
  5	
  community	
  nominated	
  
Protein
Protein
Cancer	
  markers	
  
Transplant	
  Rejec3on	
  markers	
  
Preeclampsia:	
  large	
  cause	
  of	
  maternal	
  and	
  
fetal	
  death	
  
•  Incidence	
  
•  5-­‐8%	
  of	
  all	
  pregnancies	
  in	
  the	
  U.S.	
  and	
  worldwide	
  
•  4.1	
  million	
  births	
  in	
  the	
  U.S.	
  in	
  2009	
  
•  Up	
  to	
  300K	
  cases	
  of	
  preeclampsia	
  annually	
  in	
  the	
  U.S.	
  
•  Mortality	
  
•  Responsible	
  for	
  18%	
  of	
  all	
  maternal	
  deaths	
  in	
  the	
  U.S.	
  
•  Maternal	
  death	
  in	
  56	
  out	
  of	
  every	
  100,000	
  live	
  births	
  in	
  US	
  
•  Neonatal	
  death	
  in	
  71	
  out	
  of	
  every	
  100,000	
  live	
  births	
  in	
  US	
  
•  Cost	
  
•  $20	
  billion	
  in	
  direct	
  costs	
  in	
  the	
  U.S	
  annually	
  
•  Average	
  hospital	
  stay	
  of	
  3.5	
  days	
  
Linda	
  Liu	
  
Ma;	
  Cooper	
  
Bruce	
  Ling	
  
New	
  markers	
  for	
  preeclampsia	
  
p	
  value	
   3.49	
  X	
  10-­‐4	
  1.79	
  X	
  10-­‐5	
  
ng/ml	
  
p	
  value	
  =	
  1.92	
  X	
  10-­‐8	
  
Control	
  
N=16	
  
Preeclampsia	
  
N=15	
  
Control	
  
N=16	
  
Preeclampsia	
  
N=17	
  
GA	
  23-­‐34	
  weeks	
   GA	
  >	
  34	
  weeks	
  
ng/ml	
  
Gesta3onal	
  age	
  (weeks)	
  
march of dimes®
prematurity research center
VERSION: MOD_PRC_LOGO_R7G_082712
at STANFORD University School of Medicine
Linda	
  Liu	
  
Bruce	
  Ling	
  
Sequencing	
  Excitement	
  
•  454/Roche,	
  Life	
  Technologies	
  
•  Helicos:	
  $30k	
  genome	
  
•  Pacific	
  Biosystems:	
  sequence	
  
human	
  genome	
  in	
  15	
  minutes	
  
•  Run	
  'mes	
  in	
  minutes	
  	
  
at	
  a	
  cost	
  of	
  hundreds	
  of	
  dollars	
  
•  Complete	
  Genomics:	
  
80	
  genomes/day	
  
•  Ion	
  Torrent	
  	
  and	
  
Illumina:	
  ~$1500	
  per	
  	
  
genome	
  
•  Oxford:	
  USB	
  s'ck	
  
Lancet,	
  375:1525,	
  May	
  1,	
  2010.	
  	
  
Credit:	
  Euan	
  Ashley,	
  Russ	
  Altman,	
  Steve	
  Quake,	
  Lancet	
  
•  Study	
  published	
  in	
  2008	
  in	
  
Inflammatory	
  Bowel	
  
Disease	
  
•  Crohn’s	
  Disease	
  and	
  
Ulcera've	
  Coli's	
  
•  Inves'gated	
  9	
  loci	
  in	
  700	
  
Finnish	
  IBD	
  pa'ents	
  
•  We	
  record	
  100+	
  items	
  
–  GWAS,	
  non-­‐GWAS	
  papers	
  
–  Disease,	
  Phenotype	
  
–  Popula'on,	
  Gender	
  
–  Alleles	
  and	
  Genotypes	
  
–  p-­‐value	
  (and	
  confidence)	
  
–  Odds	
  ra'o	
  (and	
  confidence)	
  
–  Technology,	
  Study	
  design	
  
–  Gene'c	
  model	
  
•  Mapped	
  to	
  UMLS	
  concepts	
  Rong	
  Chen	
  
Optra	
  Systems	
  
•  Study	
  published	
  in	
  2008	
  in	
  
Inflammatory	
  Bowel	
  
Disease	
  
•  Crohn’s	
  Disease	
  and	
  
Ulcera've	
  Coli's	
  
•  Inves'gated	
  9	
  loci	
  in	
  700	
  
Finnish	
  IBD	
  pa'ents	
  
•  We	
  record	
  100+	
  items	
  
–  GWAS,	
  non-­‐GWAS	
  papers	
  
–  Disease,	
  Phenotype	
  
–  Popula'on,	
  Gender	
  
–  Alleles	
  and	
  Genotypes	
  
–  p-­‐value	
  (and	
  confidence)	
  
–  Odds	
  ra'o	
  (and	
  confidence)	
  
–  Technology,	
  Study	
  design	
  
–  Gene'c	
  model	
  
•  Mapped	
  to	
  UMLS	
  concepts	
  
•  Study	
  published	
  in	
  
2009	
  in	
  
Rheumatology	
  
•  Ankylosing	
  
spondyli's	
  
•  Inves'gated	
  8	
  
SNPs	
  in	
  IL23R	
  in	
  
2000	
  UK	
  case-­‐
control	
  pa'ents	
  
•  Tables	
  can	
  be	
  rotated	
  
•  NLP	
  is	
  hard	
  
•  Study	
  published	
  in	
  
2009	
  in	
  
Rheumatology	
  
•  Ankylosing	
  
spondyli's	
  
•  Inves'gated	
  8	
  
SNPs	
  in	
  IL23R	
  in	
  
2000	
  UK	
  case-­‐
control	
  pa'ents	
  
•  Tables	
  can	
  be	
  rotated	
  
•  NLP	
  is	
  hard	
  
•  Study	
  published	
  in	
  
2009	
  in	
  
Rheumatology	
  
•  Ankylosing	
  
spondyli's	
  
•  Inves'gated	
  8	
  
SNPs	
  in	
  IL23R	
  in	
  
2000	
  UK	
  case-­‐
control	
  pa'ents	
  
•  Tables	
  can	
  be	
  rotated	
  
•  NLP	
  is	
  hard	
  
What	
  are	
  the	
  alleles	
  for	
  rs1004819?	
  
Alleles	
  for	
  rs1004819	
  are	
  C	
  and	
  T	
  
~11%	
  of	
  records	
  reported	
  genotypes	
  in	
  the	
  nega3ve	
  strand	
  
Number	
  of	
  
papers	
  
curated	
  
Number	
  of	
  
records	
  
Dis3nct	
  SNPs	
   Diseases	
  and	
  
phenotypes	
  
~19,000	
   ~1.6	
  million	
   ~473,000	
   ~7,400	
  
Rong	
  Chen	
  
Anil	
  Patwardhan	
  
Michael	
  Clark	
  
Optra	
  Systems	
  
Personalis	
  
VARIMED:	
  Variants	
  Informing	
  Medicine	
  
Chen	
  R,	
  Davydov	
  EV,	
  Sirota	
  M,	
  Bu;e	
  AJ.	
  	
  
PLoS	
  One.	
  	
  
2010	
  October:	
  5(10):	
  e13574.	
  
Diseases	
  and	
  Traits	
  
• Risk	
  factors	
  are	
  associated	
  with	
  an	
  increased	
  likelihood	
  of	
  
developing	
  a	
  given	
  diseases	
  
•  Smoking	
  à	
  chronic	
  obstruc've	
  pulmonary	
  disease	
  
• Risk	
  factors	
  are	
  iden'fied	
  for	
  diseases	
  through	
  large	
  scale	
  
epidemiological	
  studies,	
  which	
  are	
  resource	
  intensive	
  
• GWAS	
  have	
  iden'fied	
  gene'c	
  variants	
  for	
  thousands	
  of	
  
diseases	
  and	
  traits	
  
• If	
  traits	
  and	
  diseases	
  share	
  the	
  same	
  associated	
  gene'c	
  
variants,	
  could	
  the	
  trait	
  be	
  used	
  to	
  suggest	
  risk	
  factors	
  for	
  
disease?	
  
Li	
  L,	
  Ruau	
  DJ,	
  Patel	
  CJ,	
  Weber	
  SC,	
  Chen	
  R,	
  Tatonej	
  NP,	
  Dudley	
  JT,	
  Bu;e	
  AJ.	
  	
  
Science	
  Transla3onal	
  Medicine,	
  2014,	
  6(234).	
  
Li	
  Li	
  
EMR Cohort
Identify significant disease-trait genetic associations
and clinically validate using EMR data
Gene counts > 3
Disease
(n=201)
Varimed	
  
TF-IDF weighing
Cosine distance
Random shuffling
Trait
(n=85)
Disease
(n=69)
Trait
(n=249)
Disease-Trait Pair
(n=120)
p < 1e-8
Disease modules (n=8)
Gene3cs	
  Module	
  
Clinical	
  Valida3on	
  
Novel predictions
(n=26)
T
q ≤ 0.01
D
Published findings
(n=94)
T D
Trait modules (n=7)
Complications
Diagnostic tests
Risk factors
1st dx
After dxBefore dx
1st dx
Li	
  Li	
  
Assessing	
  significance	
  of	
  disease-­‐trait	
  (D-­‐T)	
  pair	
  
•  Each	
  gene	
  within	
  individual	
  disease	
  or	
  trait	
  by	
  taking	
  into	
  account	
  the	
  
frequency	
  of	
  the	
  gene:	
  Term	
  Frequency–Inverse	
  Document	
  Frequency	
  
•  2-­‐idf(i,	
  j)	
  =	
  2(i,	
  j)	
  ×	
  idfi,	
  =	
  ni,	
  j/(∑k	
  nk,	
  j)	
  x	
  log(D/Di)	
  which	
  adjusted	
  the	
  score	
  of	
  6(i,	
  j)	
  by	
  taking	
  into	
  
account	
  the	
  popularity	
  level	
  of	
  the	
  gene	
  i.	
  	
  
•  e.g,	
  154	
  D+T,	
  28	
  genes	
  in	
  Alzheimer's	
  disease	
  and	
  5	
  genes	
  in	
  ESR,	
  CR1	
  was	
  in	
  common	
  
•  s-­‐idf	
  (AD)=1/28	
  x	
  log(154/2,10)=0.067	
  
•  s-­‐idf	
  (ESR)=1/5	
  x	
  log(154/2,10)=0.377	
  
•  D-­‐T	
  distance	
  score	
  was	
  calculated	
  using	
  Cosine	
  distance	
  to	
  evaluate	
  
similarity	
  between	
  all	
  pairs.	
  
•  Randomly	
  sampling	
  all	
  the	
  genes	
  across	
  all	
  the	
  traits,	
  and	
  calculated	
  the	
  D-­‐T	
  
similarity,	
  repeated	
  1,000	
  'mes	
  and	
  generated	
  the	
  q	
  value	
  based	
  on	
  the	
  
number	
  of	
  the	
  samplings.	
  
∑∑
∑
==
=
×
×
=
•
=−
n
i i
n
i i
n
i ii
TD
TD
TD
TD
TDsimilarityine
1
2
1
2
1
)()(
),(cos
=	
  0.9274524	
  
Li	
  L,	
  Ruau	
  DJ,	
  Patel	
  CJ,	
  Weber	
  SC,	
  Chen	
  R,	
  Tatonej	
  NP,	
  Dudley	
  JT,	
  Bu;e	
  AJ.	
  	
  
Science	
  Transla>onal	
  Medicine,	
  2014,	
  6(234).	
  
Li	
  Li	
  
Li	
  Li	
  
Li	
  Li	
  
Categoriza3ons	
  for	
  known	
  D-­‐T	
  pairs	
  and	
  discover	
  poten3al	
  
confounders	
  in	
  GWAS	
  studies	
  
38 pairs 27 pairs 28 pairs
93 pairs
T D
Gene3c	
  Variants	
  
TD
Gene3c	
  Variants	
  
Timing	
  of	
  Disease	
  Progression	
  
Risk	
  Factor	
   Consequence	
  
T
D
Gene3c	
  Variants	
  
Diagnos3c	
  Test	
  
Li	
  Li	
  
Diagnos3c	
  tests	
  where	
  traits	
  occur	
  at	
  the	
  same	
  3me	
  as	
  disease	
  
onset	
  
An3body	
  3ter	
  
Hepa<<s	
  B	
  vaccine	
  response	
  
Png	
  et	
  al,	
  Hum	
  Mol	
  Genet,	
  2011	
  
Even	
  though	
  this	
  GWAS	
  did	
  not	
  explicitly	
  par'cipants	
  with	
  the	
  autoimmune	
  diseases	
  above,	
  our	
  
approach	
  inferred	
  known	
  rela'onships	
  between	
  diseases	
  and	
  traits	
  based	
  on	
  their	
  shared	
  gene'c	
  
architecture	
  	
  
T
D
Gene3c	
  Variants	
  
Diagnos3c	
  Test	
  
Li	
  Li	
  
Significant	
  genes	
  shared	
  between	
  an3body	
  3ter	
  and	
  	
  
16	
  autoimmune	
  diseases	
  
Disease	
   Common	
  Genes	
   Genes	
  Shared	
   q-­‐value	
  
Alopecia	
  areata	
   4	
   BTNL2;	
  C6orf10;	
  RDBP;	
  TNXB	
   <0.001	
  
Ankylosing	
  spondyli's	
   2	
   BTNL2;	
  LOC100507436	
   0.001	
  
Asthma	
   4	
   BTNL2;	
  C6orf10;	
  HLA-­‐DPA1;	
  NOTCH4;	
   <0.001	
  
Biliary	
  liver	
  cirrhosis	
   3	
   BTNL2;	
  C6orf10;	
  HLA-­‐DPB1	
   0.003	
  
Chronic	
  hepa''s	
  b	
   2	
   HLA-­‐DPA1;	
  HLA-­‐DPB1	
   <0.001	
  
HIV	
  infec'on	
   7	
   C6orf10;	
  HLA-­‐C;	
  LOC100507436;	
  NOTCH4;	
  PRRC2A;	
  RDBP;	
  TNXB	
   <0.001	
  
Membranous	
  nephropathy	
   15	
  
AGPAT1;	
  BAG6;	
  BTNL2;	
  C6orf10;	
  EHMT2;	
  GPANK1;	
  LY6G5B;	
  LY6G6C;	
  NOTCH4;	
  
PRRC2A;	
  RDBP;	
  RNF5;	
  SLC44A4;	
  TNXB;	
  ZBTB12	
  
<0.001	
  
Mul'ple	
  sclerosis	
   7	
   AGPAT1;	
  BAG6;	
  BTNL2;	
  C6orf10;	
  EHMT2;	
  NOTCH4;	
  TNXB	
   <0.001	
  
Neonatal	
  lupus	
   3	
   BAG6;	
  C6orf10;	
  ZBTB12	
   <0.001	
  
Primary	
  biliary	
  cirrhosis	
   3	
   BTNL2;	
  C6orf10;	
  HLA-­‐DPB1	
   0.005	
  
Rheumatoid	
  arthri's	
   20	
  
AGPAT1;	
  BAG6;	
  BTNL2;	
  C6orf10;	
  EHMT2;	
  GPANK1;	
  HLA-­‐C;	
  HLA-­‐DPA1;	
  HLA-­‐DPB1;	
  
LOC100507436;	
  LY6G5B;	
  LY6G6C;	
  LY6G6F;	
  NOTCH4;	
  PRRC2A;	
  RDBP;	
  RNF5;	
  
SLC44A4;	
  TNXB;	
  ZBTB12	
  
<0.001	
  
Systemic	
  lupus	
  
erythematosus	
  
9	
   BAG6;	
  BTNL2;	
  C6orf10;	
  GPANK1;	
  HLA-­‐DPB1;	
  NOTCH4;	
  PRRC2A;	
  TNXB;	
  ZBTB12	
   <0.001	
  
Systemic	
  sclerosis	
   3	
   HLA-­‐DPA1;	
  HLA-­‐DPB1;	
  NOTCH4	
   <0.001	
  
Type	
  1	
  diabetes	
   5	
   BAG6;	
  BTNL2;	
  C6orf10;	
  HLA-­‐C;	
  HLA-­‐DPB1	
   0.001	
  
Vi'ligo	
   6	
   AGPAT1;	
  BTNL2;	
  NOTCH4;	
  RNF5;	
  SLC44A4;	
  TNXB	
   <0.001	
  
Wegener's	
  granulomatosis	
   2	
   HLA-­‐DPA1;	
  HLA-­‐DPB1	
   <0.001	
  
Li	
  Li	
  
Risk	
  factors	
  where	
  traits	
  occur	
  prior	
  to	
  the	
  disease	
  onset	
  and	
  may	
  
accompany	
  disease	
  
Trait	
   Disease	
   Common	
  Genes	
   Genes	
  Shared	
   q-­‐value	
  
Smoking	
   Chronic	
  obstruc've	
  pulmonary	
  disease	
   3	
   AGPHD1;	
  CHRNA3;	
  RAB4B	
   <0.001	
  
Gene3cs	
  Variants	
  
Known	
  clinical	
  study:	
  Smoking	
  is	
  the	
  primary	
  risk	
  factor	
  for	
  
COPD	
  although	
  lixle	
  was	
  known	
  the	
  pathogenesis	
  between	
  
smoking	
  and	
  COPD.	
  Pauwels	
  et	
  al,	
  2001,	
  Vestbo	
  et	
  al	
  2012	
  
	
  
In	
  GWAS	
  study:	
  Six	
  GWAS	
  studies	
  are	
  related	
  to	
  COPD	
  in	
  
VARIMED	
  and	
  their	
  COPD	
  cohorts	
  all	
  are	
  from	
  smoking	
  
pa'ents.	
  	
  Cho	
  et	
  al,	
  2012,	
  Pillai	
  SG,	
  2010,	
  Wang	
  et	
  al	
  2010,	
  Cho	
  et	
  al,	
  2010,	
  
lambrechts	
  et	
  al,	
  2010,	
  Pillai	
  SG,	
  2009	
  
	
  
As	
  COPD	
  occurs	
  ayer	
  smoking,	
  the	
  variants	
  associated	
  with	
  
COPD	
  could	
  be	
  influenced	
  by	
  smoking,	
  and	
  the	
  gene'c	
  
variants	
  for	
  COPD	
  could	
  be	
  unmasked	
  if	
  smoking	
  confounder	
  
is	
  excluded	
  in	
  GWAS.	
  
Smoking	
   COPD	
  
Li	
  Li	
  
Gene3c	
  Variants	
  
Consequence	
  where	
  traits	
  occur	
  aqer	
  the	
  disease	
  onset	
  
Trait	
   Common	
  Genes	
   Genes	
  Shared	
   q-­‐value	
  
Alanine	
  aminotransferase	
  levels	
   1	
   C12orf51	
   0.001	
  
Cholesterol	
  levels	
   3	
   ALDH2;	
  BRAP;	
  C12orf51	
   0.001	
  
HDL	
  cholesterol	
  levels	
   2	
   C12orf51;	
  OAS3	
   <0.001	
  
Known	
  clinical	
  study:	
  High	
  HDL	
  criterion	
  was	
  observed	
  with	
  
triple	
  frequency	
  in	
  the	
  ADS	
  group,	
  high	
  cholesterol	
  diet	
  was	
  
associated	
  with	
  ADS	
  pa'ents	
  ,	
  and	
  	
  ALT	
  levels	
  have	
  been	
  
seen	
  to	
  increase	
  with	
  daily	
  alcohol	
  intake	
  in	
  pa'ents	
  who	
  
developed	
  ADS.	
  Kahl	
  et	
  al,	
  2010;	
  imhof	
  et	
  al,	
  2001,	
  Gross	
  GA,	
  1994	
  
	
  
In	
  GWAS	
  study:	
  3	
  genes	
  for	
  cholesterol	
  levels	
  reported	
  by	
  
Kato	
  et	
  al.	
  and	
  2	
  genes	
  for	
  ALT	
  and	
  HDL-­‐C	
  reported	
  by	
  Young	
  
et	
  al.	
  	
  could	
  be	
  biased	
  by	
  alcohol	
  effect	
  as	
  the	
  authors	
  did	
  not	
  
perform	
  alcohol	
  intake	
  adjustment	
  or	
  controlled	
  for	
  drinking	
  
habits	
  on	
  these	
  genes	
  in	
  their	
  GWAS	
  studies.	
  Kato	
  et	
  al,	
  2011;	
  
Kamatani	
  et	
  al,	
  2010	
  
	
  
The	
  GWAS	
  to	
  iden'fy	
  concrete	
  gene'c	
  variants	
  for	
  these	
  
three	
  clinical	
  measurements	
  should	
  be	
  performed	
  in	
  pa'ents	
  
without	
  ADS	
  as	
  a	
  confounder	
  
Alcohol	
  dependence	
  syndrome	
  	
  
(ADS)	
  
ALT	
  
HDL-­‐C	
  
ADS	
  
Li	
  Li	
  
27	
  novel	
  pairs	
  
Trait	
   Disease	
  
Common	
  
Genes	
  
Genes	
  Shared	
   q-­‐value	
  
Mean	
  corpuscular	
  volume	
   Acute	
  lymphoblas3c	
  leukemia	
   1	
   IKZF1	
   0.001	
  
Mean	
  cell	
  hemoglobin	
  concentra3on	
   Alcohol	
  dependence	
   1	
   ALDH2	
   0.005	
  
Platelet	
  count	
   Alcohol	
  dependence	
   1	
   C12orf51	
   0.007	
  
Lung	
  func'on	
   Alopecia	
  areata	
   1	
   AGER	
   0.008	
  
Erythrocyte	
  sedimenta3on	
  rate	
   Alzheimer's	
  disease	
   1	
   CR1	
   0.004	
  
Prostate-­‐Specific	
  an'gen	
  levels	
   Basal	
  cell	
  carcinoma	
   1	
   CLPTM1L	
   0.004	
  
Eye	
  color	
   Chronic	
  lymphocy'c	
  leukemia	
   1	
   IRF4	
   0.006	
  
Freckles	
   Chronic	
  lymphocy'c	
  leukemia	
   1	
   IRF4	
   0.008	
  
Blood	
  pressure	
   Esophageal	
  cancer	
   3	
   ALDH2,	
  C12orf51,	
  PLCE1	
   0.009	
  
Factor	
  vii	
  coagulant	
  ac'vity	
   Esophageal	
  cancer	
   1	
   ADH4	
   0.008	
  
Serum	
  magnesium	
  levels	
   Gastric	
  cancer	
   3	
   MUC1;	
  THBS3;	
  TRIM46	
   <0.001	
  
Prostate-­‐Specific	
  an'gen	
  levels	
   Glioma	
   1	
   TERT	
   0.005	
  
Alpha	
  linolenic	
  acid	
  levels	
   Glucose	
  intolerance	
   1	
   FADS1	
   0.01	
  
Alanine	
  aminotransferase	
  levels	
   Hypertension	
   1	
   C12orf51	
   0.003	
  
Serum	
  transferrin	
  levels	
   Hypertension	
   1	
   HFE	
   0.005	
  
Smoking	
   Kawasaki	
  disease	
   1	
   RAB4B	
   0.003	
  
Prostate-­‐Specific	
  an'gen	
  levels	
   Lung	
  cancer	
   2	
   CLPTM1L;	
  TERT	
   0.001	
  
Homocysteine	
  levels	
   Melanoma	
   1	
   C16orf55	
   0.01	
  
Protein	
  c	
  levels	
   Melanoma	
   2	
   NCOA6;	
  PIGU	
   <0.001	
  
Transferrin	
  receptor	
  levels	
   Metabolic	
  syndrome	
   3	
   APOA5;	
  BUD13;	
  ZNF259	
   <0.001	
  
PR	
  interval	
   Open-­‐Angle	
  glaucoma	
   1	
   CAV1	
   0.002	
  
PR	
  interval	
   Restless	
  legs	
  syndrome	
   1	
   MEIS1	
   0.003	
  
Bone	
  mineral	
  density	
   Sudden	
  cardiac	
  arrest	
   1	
   ESR1	
   0.006	
  
Acenocoumarol	
  maintenance	
  dosage	
  Systemic	
  lupus	
  erythematosus	
   2	
   ITGAM;	
  ITGAX	
   0.004	
  
Platelet	
  count	
   Tes'cular	
  cancer	
   1	
   BAK1	
   0.003	
  
Prostate-­‐Specific	
  an'gen	
  levels	
   Tes'cular	
  cancer	
   2	
   CLPTM1L;	
  TERT	
   <0.001	
  
Alkaline	
  phosphatase	
  levels	
   Venous	
  thromboembolism	
   1	
   ABO	
   0.008	
  
Li	
  Li	
  
Independent	
  pa3ent	
  cohort	
  valida3on:	
  clinical	
  data	
  warehouses	
  
•  STRIDE:	
  clinical	
  data	
  warehouse,	
  has	
  ICD9	
  diagnoses	
  codes,	
  CPT	
  procedure	
  
codes,	
  and	
  lab	
  results	
  on	
  over	
  1.7	
  million	
  pediatric	
  and	
  adult	
  pa'ents	
  at	
  
Stanford	
  Hospital	
  and	
  Clinic,	
  independent	
  cohort	
  
1/1/2005	
  to	
  7/15/2012	
  
•  Collabora'ons	
  also	
  with	
  Columbia	
  University	
  and	
  Mount	
  Sinai	
  School	
  of	
  
Medicine	
  to	
  validate	
  findings	
  
•  Time	
  frame	
  for	
  analysis:	
  within	
  one	
  year	
  before	
  the	
  1st	
  disease	
  diagnosis	
  or	
  
within	
  one	
  year	
  ayer	
  the	
  1st	
  disease	
  diagnosis	
  
1st Dx
Target	
  disease	
  (case)	
  
Non-­‐target	
  disease	
  (control)	
  
lab lab
1 year 1 year
Li	
  Li	
  
Serum	
  magnesium	
  levels	
  and	
  gastric	
  cancer	
  
Li	
  Li	
  
immport.niaid.nih.gov	
  
Digital	
  compara3ve	
  
effec3veness	
  
Find	
  precision	
  subsets	
  
If	
  entry	
  criteria	
  are	
  same,	
  outcome	
  
measures	
  are	
  same,	
  and	
  comparable	
  
studies,	
  can	
  perform	
  “meta-­‐trial”	
  
Take	
  Home	
  Points	
  
•  Personalized	
  medicine	
  	
  ≥ DNA.	
  	
  Will	
  include	
  other	
  
clinical,	
  molecular,	
  and	
  environment	
  measures.	
  
•  We	
  need	
  new	
  inves'gators	
  who	
  can	
  imagine	
  basic	
  
ques'ons	
  to	
  ask	
  of	
  these	
  repositories	
  of	
  clinical	
  
and	
  genomic	
  measurements.	
  
•  Bioinforma'cs	
  is	
  not	
  just	
  about	
  building	
  tools.	
  	
  
We	
  know	
  our	
  tools;	
  we	
  should	
  use	
  them	
  first.	
  
Don’t	
  be	
  afraid	
  to	
  test	
  your	
  ideas.	
  
Funded	
  post-­‐doctoral	
  
posi3ons	
  in	
  
Transla3onal	
  
Bioinforma3cs	
  
	
  
	
  
Contact	
  Atul	
  Bu;e	
  
abu;e@stanford.edu	
  
Collaborators	
  
•  Jeff	
  Wiser,	
  Patrick	
  Dunn,	
  Mike	
  Atassi	
  /	
  Northrop	
  Grumman	
  
•  Ashley	
  Xia	
  and	
  Quan	
  Chen	
  /	
  NIAID	
  
•  Takashi	
  Kadowaki,	
  Momoko	
  Horikoshi,	
  Kazuo	
  Hara,	
  Hiroshi	
  Ohtsu	
  /	
  U	
  Tokyo	
  
•  Kyoko	
  Toda,	
  Satoru	
  Yamada,	
  Junichiro	
  Irie	
  /	
  Kitasato	
  Univ	
  and	
  Hospital	
  
•  Shiro	
  Maeda	
  /	
  RIKEN	
  
•  Alejandro	
  Sweet-­‐Cordero,	
  Julien	
  Sage	
  /	
  Pediatric	
  Oncology	
  
•  Mark	
  Davis,	
  C.	
  Garrison	
  Fathman	
  /	
  Immunology	
  
•  Russ	
  Altman,	
  Steve	
  Quake	
  /	
  Bioengineering	
  
•  Euan	
  Ashley,	
  Joseph	
  Wu,	
  Tom	
  Quertermous	
  /	
  Cardiology	
  
•  Mike	
  Snyder,	
  Carlos	
  Bustamante,	
  Anne	
  Brunet	
  /	
  Gene'cs	
  
•  Jay	
  Pasricha	
  /	
  Gastroenterology	
  
•  Rob	
  Tibshirani,	
  Brad	
  Efron	
  /	
  Sta's'cs	
  
•  Hannah	
  Valan'ne,	
  Kiran	
  Khush/	
  Cardiology	
  
•  Ken	
  Weinberg	
  /	
  Pediatric	
  Stem	
  Cell	
  Therapeu'cs	
  
•  Mark	
  Musen,	
  Nigam	
  Shah	
  /	
  Na'onal	
  Center	
  for	
  Biomedical	
  Ontology	
  
•  Minnie	
  Sarwal	
  /	
  Nephrology	
  
•  David	
  Miklos	
  /	
  Oncology	
  
Support	
  
•  Lucile	
  Packard	
  Founda'on	
  for	
  Children's	
  Health	
  
•  NIH:	
  NIAID,	
  NLM,	
  NIGMS,	
  NCI;	
  NIDDK,	
  NHGRI,	
  NIA,	
  NHLBI,	
  NCATS	
  
•  March	
  of	
  Dimes	
  
•  Hewlex	
  Packard	
  
•  Howard	
  Hughes	
  Medical	
  Ins'tute	
  
•  California	
  Ins'tute	
  for	
  Regenera've	
  Medicine	
  
•  Luke	
  Evnin	
  and	
  Deann	
  Wright	
  (Scleroderma	
  Research	
  Founda'on)	
  
•  Clayville	
  Research	
  Fund	
  
•  PhRMA	
  Founda'on	
  
•  Stanford	
  Cancer	
  Center,	
  Bio-­‐X,	
  SPARK	
  
•  Tarangini	
  Deshpande	
  
•  Alan	
  Krensky,	
  Harvey	
  Cohen	
  
•  Hugh	
  O’Brodovich	
  
•  Isaac	
  Kohane	
  
Admin	
  and	
  Tech	
  Staff	
  
•  Susan	
  Aptekar	
  
•  Jen	
  Cory	
  
•  Boris	
  Oskotsky	
  

More Related Content

What's hot

Atul Butte's presentation to the Association of Medical School Pediatric Depa...
Atul Butte's presentation to the Association of Medical School Pediatric Depa...Atul Butte's presentation to the Association of Medical School Pediatric Depa...
Atul Butte's presentation to the Association of Medical School Pediatric Depa...University of California, San Francisco
 
Atul Butte's presentation for the FDA 5th Annual Scientific Computing Days
Atul Butte's presentation for the FDA 5th Annual Scientific Computing DaysAtul Butte's presentation for the FDA 5th Annual Scientific Computing Days
Atul Butte's presentation for the FDA 5th Annual Scientific Computing DaysUniversity of California, San Francisco
 
Presentation on Research Reproducibility at Friends of the National Library o...
Presentation on Research Reproducibility at Friends of the National Library o...Presentation on Research Reproducibility at Friends of the National Library o...
Presentation on Research Reproducibility at Friends of the National Library o...University of California, San Francisco
 
The Uneven Future of Evidence-Based Medicine
The Uneven Future of Evidence-Based MedicineThe Uneven Future of Evidence-Based Medicine
The Uneven Future of Evidence-Based MedicineIda Sim
 
2015-04-28 Atul Butte's presentation to the NIH Precision Medicine Initiative...
2015-04-28 Atul Butte's presentation to the NIH Precision Medicine Initiative...2015-04-28 Atul Butte's presentation to the NIH Precision Medicine Initiative...
2015-04-28 Atul Butte's presentation to the NIH Precision Medicine Initiative...University of California, San Francisco
 
Atul Butte presentation on 2019-02-05 for Accelerating biology 2019: Towards ...
Atul Butte presentation on 2019-02-05 for Accelerating biology 2019: Towards ...Atul Butte presentation on 2019-02-05 for Accelerating biology 2019: Towards ...
Atul Butte presentation on 2019-02-05 for Accelerating biology 2019: Towards ...University of California, San Francisco
 
Atul Butte's presentation at the From Data to Discovery symposium at Westat
Atul Butte's presentation at the From Data to Discovery symposium at WestatAtul Butte's presentation at the From Data to Discovery symposium at Westat
Atul Butte's presentation at the From Data to Discovery symposium at WestatUniversity of California, San Francisco
 
Atul Butte's presentation at #AMIA2021 for the Knowledge Discovery and Data M...
Atul Butte's presentation at #AMIA2021 for the Knowledge Discovery and Data M...Atul Butte's presentation at #AMIA2021 for the Knowledge Discovery and Data M...
Atul Butte's presentation at #AMIA2021 for the Knowledge Discovery and Data M...University of California, San Francisco
 
Presentation for the CSIR Fourth Paradigm Institute Silver Jubilee (Bangalore...
Presentation for the CSIR Fourth Paradigm Institute Silver Jubilee (Bangalore...Presentation for the CSIR Fourth Paradigm Institute Silver Jubilee (Bangalore...
Presentation for the CSIR Fourth Paradigm Institute Silver Jubilee (Bangalore...University of California, San Francisco
 

What's hot (20)

Atul Butte's presentation at LINCS 2013
Atul Butte's presentation at LINCS 2013Atul Butte's presentation at LINCS 2013
Atul Butte's presentation at LINCS 2013
 
Precision Medicine World Conference 2017
Precision Medicine World Conference 2017Precision Medicine World Conference 2017
Precision Medicine World Conference 2017
 
Atul Butte's presentation at the Milken Institute Public Health Summit
Atul Butte's presentation at the Milken Institute Public Health SummitAtul Butte's presentation at the Milken Institute Public Health Summit
Atul Butte's presentation at the Milken Institute Public Health Summit
 
Atul Butte's presentation to the Association of Medical School Pediatric Depa...
Atul Butte's presentation to the Association of Medical School Pediatric Depa...Atul Butte's presentation to the Association of Medical School Pediatric Depa...
Atul Butte's presentation to the Association of Medical School Pediatric Depa...
 
2015-11 Atul Butte's Presentation at Exponential Medicine
2015-11 Atul Butte's Presentation at Exponential Medicine2015-11 Atul Butte's Presentation at Exponential Medicine
2015-11 Atul Butte's Presentation at Exponential Medicine
 
Atul Butte's presentation for the FDA 5th Annual Scientific Computing Days
Atul Butte's presentation for the FDA 5th Annual Scientific Computing DaysAtul Butte's presentation for the FDA 5th Annual Scientific Computing Days
Atul Butte's presentation for the FDA 5th Annual Scientific Computing Days
 
Intro: California Initiative to Advance Precision Medicine Workshop
Intro: California Initiative to Advance Precision Medicine WorkshopIntro: California Initiative to Advance Precision Medicine Workshop
Intro: California Initiative to Advance Precision Medicine Workshop
 
Presentation on Research Reproducibility at Friends of the National Library o...
Presentation on Research Reproducibility at Friends of the National Library o...Presentation on Research Reproducibility at Friends of the National Library o...
Presentation on Research Reproducibility at Friends of the National Library o...
 
Atul Butte's AAPS big data workshop presentation 6/2015
Atul Butte's AAPS big data workshop presentation 6/2015Atul Butte's AAPS big data workshop presentation 6/2015
Atul Butte's AAPS big data workshop presentation 6/2015
 
2013 09 atul butte mahajani symposium
2013 09 atul butte mahajani symposium2013 09 atul butte mahajani symposium
2013 09 atul butte mahajani symposium
 
The Uneven Future of Evidence-Based Medicine
The Uneven Future of Evidence-Based MedicineThe Uneven Future of Evidence-Based Medicine
The Uneven Future of Evidence-Based Medicine
 
2013 01 pmwc atul butte scrubbed
2013 01 pmwc atul butte scrubbed2013 01 pmwc atul butte scrubbed
2013 01 pmwc atul butte scrubbed
 
2015-04-28 Atul Butte's presentation to the NIH Precision Medicine Initiative...
2015-04-28 Atul Butte's presentation to the NIH Precision Medicine Initiative...2015-04-28 Atul Butte's presentation to the NIH Precision Medicine Initiative...
2015-04-28 Atul Butte's presentation to the NIH Precision Medicine Initiative...
 
Atul Butte NIPS 2017 ML4H
Atul Butte NIPS 2017 ML4HAtul Butte NIPS 2017 ML4H
Atul Butte NIPS 2017 ML4H
 
Atul Butte presentation on 2019-02-05 for Accelerating biology 2019: Towards ...
Atul Butte presentation on 2019-02-05 for Accelerating biology 2019: Towards ...Atul Butte presentation on 2019-02-05 for Accelerating biology 2019: Towards ...
Atul Butte presentation on 2019-02-05 for Accelerating biology 2019: Towards ...
 
Atul Butte's presentation at the From Data to Discovery symposium at Westat
Atul Butte's presentation at the From Data to Discovery symposium at WestatAtul Butte's presentation at the From Data to Discovery symposium at Westat
Atul Butte's presentation at the From Data to Discovery symposium at Westat
 
Atul Butte's presentation at #AMIA2021 for the Knowledge Discovery and Data M...
Atul Butte's presentation at #AMIA2021 for the Knowledge Discovery and Data M...Atul Butte's presentation at #AMIA2021 for the Knowledge Discovery and Data M...
Atul Butte's presentation at #AMIA2021 for the Knowledge Discovery and Data M...
 
Presentation for the CSIR Fourth Paradigm Institute Silver Jubilee (Bangalore...
Presentation for the CSIR Fourth Paradigm Institute Silver Jubilee (Bangalore...Presentation for the CSIR Fourth Paradigm Institute Silver Jubilee (Bangalore...
Presentation for the CSIR Fourth Paradigm Institute Silver Jubilee (Bangalore...
 
Presentation at ISMB NIH Office of Data Science Strategy Panel
Presentation at ISMB NIH Office of Data Science Strategy PanelPresentation at ISMB NIH Office of Data Science Strategy Panel
Presentation at ISMB NIH Office of Data Science Strategy Panel
 
Atul Butte's presentation at CTIC 2020
Atul Butte's presentation at CTIC 2020Atul Butte's presentation at CTIC 2020
Atul Butte's presentation at CTIC 2020
 

Similar to Big Data in Biomedicine: Translating 300 Trillion Points of Data

C&E news talk sept 16
C&E news talk sept 16C&E news talk sept 16
C&E news talk sept 16Sean Ekins
 
"Hacking the Software for Life" - Brad Perkins (Chief Medical Officer, Human ...
"Hacking the Software for Life" - Brad Perkins (Chief Medical Officer, Human ..."Hacking the Software for Life" - Brad Perkins (Chief Medical Officer, Human ...
"Hacking the Software for Life" - Brad Perkins (Chief Medical Officer, Human ...Hyper Wellbeing
 
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan DiseasesUsing In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan DiseasesSean Ekins
 
Using Public Access Clinical Databases to Interpret NGS Variants
Using Public Access Clinical Databases to Interpret NGS VariantsUsing Public Access Clinical Databases to Interpret NGS Variants
Using Public Access Clinical Databases to Interpret NGS VariantsGolden Helix Inc
 
TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)jmoore89
 
Pollastri ACS-2015 CDD Workshop
Pollastri ACS-2015 CDD WorkshopPollastri ACS-2015 CDD Workshop
Pollastri ACS-2015 CDD WorkshopLixin Liu
 
Using Machine Learning Models Based on Phenotypic Data to Discover New Molecu...
Using Machine Learning Models Based on Phenotypic Data to Discover New Molecu...Using Machine Learning Models Based on Phenotypic Data to Discover New Molecu...
Using Machine Learning Models Based on Phenotypic Data to Discover New Molecu...Sean Ekins
 
Search engine for E NEU network science 080817
Search engine for E NEU network science 080817Search engine for E NEU network science 080817
Search engine for E NEU network science 080817Chirag Patel
 
Amia tb-review-15
Amia tb-review-15Amia tb-review-15
Amia tb-review-15Russ Altman
 
2016 Dal Human Genetics - Genomics in Medicine Lecture
2016 Dal Human Genetics - Genomics in Medicine Lecture2016 Dal Human Genetics - Genomics in Medicine Lecture
2016 Dal Human Genetics - Genomics in Medicine LectureDan Gaston
 
Open Source Pharma /Genomics and clinical practice / Prof Hosur
Open Source Pharma /Genomics and clinical practice / Prof Hosur Open Source Pharma /Genomics and clinical practice / Prof Hosur
Open Source Pharma /Genomics and clinical practice / Prof Hosur opensourcepharmafound
 
2014 12-11 Skipr99 masterclass Arnhem
2014 12-11 Skipr99 masterclass Arnhem2014 12-11 Skipr99 masterclass Arnhem
2014 12-11 Skipr99 masterclass ArnhemAlain van Gool
 
Andres Metspalu: The Estonian Genome Project
Andres Metspalu: The Estonian Genome ProjectAndres Metspalu: The Estonian Genome Project
Andres Metspalu: The Estonian Genome ProjectTHL
 
How to transform genomic big data into valuable clinical information
How to transform genomic big data into valuable clinical informationHow to transform genomic big data into valuable clinical information
How to transform genomic big data into valuable clinical informationJoaquin Dopazo
 
Exploiting NLP for Digital Disease Informatics
Exploiting NLP for Digital Disease InformaticsExploiting NLP for Digital Disease Informatics
Exploiting NLP for Digital Disease InformaticsNigel Collier
 

Similar to Big Data in Biomedicine: Translating 300 Trillion Points of Data (20)

2013 03 genomic medicine slides
2013 03 genomic medicine slides2013 03 genomic medicine slides
2013 03 genomic medicine slides
 
MLGG_for_linkedIn
MLGG_for_linkedInMLGG_for_linkedIn
MLGG_for_linkedIn
 
C&E news talk sept 16
C&E news talk sept 16C&E news talk sept 16
C&E news talk sept 16
 
"Hacking the Software for Life" - Brad Perkins (Chief Medical Officer, Human ...
"Hacking the Software for Life" - Brad Perkins (Chief Medical Officer, Human ..."Hacking the Software for Life" - Brad Perkins (Chief Medical Officer, Human ...
"Hacking the Software for Life" - Brad Perkins (Chief Medical Officer, Human ...
 
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan DiseasesUsing In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
 
Using Public Access Clinical Databases to Interpret NGS Variants
Using Public Access Clinical Databases to Interpret NGS VariantsUsing Public Access Clinical Databases to Interpret NGS Variants
Using Public Access Clinical Databases to Interpret NGS Variants
 
The Foundation of P4 Medicine
The Foundation of P4 MedicineThe Foundation of P4 Medicine
The Foundation of P4 Medicine
 
TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)
 
Pollastri ACS-2015 CDD Workshop
Pollastri ACS-2015 CDD WorkshopPollastri ACS-2015 CDD Workshop
Pollastri ACS-2015 CDD Workshop
 
Using Machine Learning Models Based on Phenotypic Data to Discover New Molecu...
Using Machine Learning Models Based on Phenotypic Data to Discover New Molecu...Using Machine Learning Models Based on Phenotypic Data to Discover New Molecu...
Using Machine Learning Models Based on Phenotypic Data to Discover New Molecu...
 
Search engine for E NEU network science 080817
Search engine for E NEU network science 080817Search engine for E NEU network science 080817
Search engine for E NEU network science 080817
 
Amia tb-review-15
Amia tb-review-15Amia tb-review-15
Amia tb-review-15
 
Watson – from Jeopardy to healthcare
Watson – from Jeopardy to healthcareWatson – from Jeopardy to healthcare
Watson – from Jeopardy to healthcare
 
2016 Dal Human Genetics - Genomics in Medicine Lecture
2016 Dal Human Genetics - Genomics in Medicine Lecture2016 Dal Human Genetics - Genomics in Medicine Lecture
2016 Dal Human Genetics - Genomics in Medicine Lecture
 
Open Source Pharma /Genomics and clinical practice / Prof Hosur
Open Source Pharma /Genomics and clinical practice / Prof Hosur Open Source Pharma /Genomics and clinical practice / Prof Hosur
Open Source Pharma /Genomics and clinical practice / Prof Hosur
 
2014 12-11 Skipr99 masterclass Arnhem
2014 12-11 Skipr99 masterclass Arnhem2014 12-11 Skipr99 masterclass Arnhem
2014 12-11 Skipr99 masterclass Arnhem
 
Andres Metspalu: The Estonian Genome Project
Andres Metspalu: The Estonian Genome ProjectAndres Metspalu: The Estonian Genome Project
Andres Metspalu: The Estonian Genome Project
 
How to transform genomic big data into valuable clinical information
How to transform genomic big data into valuable clinical informationHow to transform genomic big data into valuable clinical information
How to transform genomic big data into valuable clinical information
 
Exploiting NLP for Digital Disease Informatics
Exploiting NLP for Digital Disease InformaticsExploiting NLP for Digital Disease Informatics
Exploiting NLP for Digital Disease Informatics
 
GFII 2014 Big Data
GFII 2014 Big DataGFII 2014 Big Data
GFII 2014 Big Data
 

Recently uploaded

Book Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbers
Book Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbersBook Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbers
Book Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbersnarwatsonia7
 
Glomerular Filtration and determinants of glomerular filtration .pptx
Glomerular Filtration and  determinants of glomerular filtration .pptxGlomerular Filtration and  determinants of glomerular filtration .pptx
Glomerular Filtration and determinants of glomerular filtration .pptxDr.Nusrat Tariq
 
Call Girls Viman Nagar 7001305949 All Area Service COD available Any Time
Call Girls Viman Nagar 7001305949 All Area Service COD available Any TimeCall Girls Viman Nagar 7001305949 All Area Service COD available Any Time
Call Girls Viman Nagar 7001305949 All Area Service COD available Any Timevijaych2041
 
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort ServiceCollege Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort ServiceNehru place Escorts
 
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment BookingCall Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment Bookingnarwatsonia7
 
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...narwatsonia7
 
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...narwatsonia7
 
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls ServiceCall Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Servicesonalikaur4
 
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service JaipurHigh Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipurparulsinha
 
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...narwatsonia7
 
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original PhotosCall Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original Photosnarwatsonia7
 
call girls in paharganj DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in paharganj DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in paharganj DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in paharganj DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️saminamagar
 
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Call Girls Service in Virugambakkam - 7001305949 | 24x7 Service Available Nea...
Call Girls Service in Virugambakkam - 7001305949 | 24x7 Service Available Nea...Call Girls Service in Virugambakkam - 7001305949 | 24x7 Service Available Nea...
Call Girls Service in Virugambakkam - 7001305949 | 24x7 Service Available Nea...Nehru place Escorts
 
Housewife Call Girls Hsr Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...
Housewife Call Girls Hsr Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...Housewife Call Girls Hsr Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...
Housewife Call Girls Hsr Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...narwatsonia7
 
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service LucknowVIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknownarwatsonia7
 
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingCall Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingNehru place Escorts
 

Recently uploaded (20)

Book Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbers
Book Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbersBook Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbers
Book Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbers
 
Glomerular Filtration and determinants of glomerular filtration .pptx
Glomerular Filtration and  determinants of glomerular filtration .pptxGlomerular Filtration and  determinants of glomerular filtration .pptx
Glomerular Filtration and determinants of glomerular filtration .pptx
 
Call Girls Viman Nagar 7001305949 All Area Service COD available Any Time
Call Girls Viman Nagar 7001305949 All Area Service COD available Any TimeCall Girls Viman Nagar 7001305949 All Area Service COD available Any Time
Call Girls Viman Nagar 7001305949 All Area Service COD available Any Time
 
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort ServiceCollege Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
 
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment BookingCall Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
 
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
 
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
 
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls ServiceCall Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Service
 
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service JaipurHigh Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
 
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
 
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
 
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original PhotosCall Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
 
call girls in paharganj DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in paharganj DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in paharganj DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in paharganj DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
 
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
 
Call Girls Service in Virugambakkam - 7001305949 | 24x7 Service Available Nea...
Call Girls Service in Virugambakkam - 7001305949 | 24x7 Service Available Nea...Call Girls Service in Virugambakkam - 7001305949 | 24x7 Service Available Nea...
Call Girls Service in Virugambakkam - 7001305949 | 24x7 Service Available Nea...
 
Housewife Call Girls Hsr Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...
Housewife Call Girls Hsr Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...Housewife Call Girls Hsr Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...
Housewife Call Girls Hsr Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...
 
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service LucknowVIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
 
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
 
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Available
 
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingCall Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
 

Big Data in Biomedicine: Translating 300 Trillion Points of Data

  • 1. Big  Data  in  Biomedicine:   Transla3ng  300  trillion  points  of  data   into  new  drugs  and  diagnos3cs       Atul  Bu;e,  MD,  PhD   Chief,  Division  of  Systems  Medicine,     Departments  of  Pediatrics,  Gene3cs,     and,  by  courtesy,  Computer  Science,   Pathology,  and  Medicine   Center  for  Pediatric  Bioinforma3cs,  LPCH   Stanford  University   abu;e@stanford.edu     @atulbu;e   @ImmPortDB  
  • 2. Disclosures   •  Scien'fic  founder  and     advisory  board  membership   –  Genstruct   –  NuMedii   –  Personalis   –  Carmenta   •  Honoraria  for  talks   –  Lilly   –  Pfizer   –  Siemens   –  Bristol  Myers  Squibb   –  AstraZeneca   –  Roche   –  Genentech   •  Past  or  present  consultancy   –  Lilly   –  Johnson  and  Johnson   –  Roche   –  NuMedii   –  Genstruct   –  Tercica   –  Ecoeos   –  Ansh  Labs   –  Prevendia   –  Samsung   –  Assay  Depot   –  Regeneron   –  Verinata   –  Geisinger   –  Covance   •  Corporate  Rela'onships   –  Northrop  Grumman   –  Aptalis   –  Thomson  Reuters   •  Speakers’  bureau   –  None   •  Companies  started  by  students   –  Carmenta   –  Serendipity   –  NuMedii   –  S'mulomics   –  NunaHealth   –  Praedicat   –  MyTime   –  Flipora    
  • 3. Big  Data  in     Biomedicine  
  • 4.
  • 5. Nearly  1.4  million  microarrays  available   Doubles  every  2-­‐3  years   Bu;e  AJ.  Transla3onal  Bioinforma3cs:     coming  of  age.  JAMIA,  2008.  
  • 6.
  • 7.
  • 8. 127  million  substances  x   740,000  assays     1.2  billion  points  of  data   within  a  grid  of     100  trillion  cells     ~250  million  ac3ve   substances  
  • 9. 5,178  compounds   ·∙                1,300  off-­‐patent  FDA-­‐approved  drugs   ·∙                700  bioac've  tool  compounds   ·∙                2,000+  screening  hits  (MLPCN  and  others)   3,712  genes  (shRNA  +  cDNA)   ·∙                targets/pathways  of  FDA-­‐approved  drugs  (n=900)   ·∙                candidate  disease  genes  (n=600)   ·∙                community  nomina'ons  (n=500+)   15  cell  types   ·∙                Banked  primary  cell  types   ·∙                Cancer  cell  lines   ·∙                Primary  hTERT  immortalized   ·∙                Pa'ent  derived  iPS  cells   ·∙                5  community  nominated  
  • 10.
  • 12. Protein Cancer  markers   Transplant  Rejec3on  markers  
  • 13. Preeclampsia:  large  cause  of  maternal  and   fetal  death   •  Incidence   •  5-­‐8%  of  all  pregnancies  in  the  U.S.  and  worldwide   •  4.1  million  births  in  the  U.S.  in  2009   •  Up  to  300K  cases  of  preeclampsia  annually  in  the  U.S.   •  Mortality   •  Responsible  for  18%  of  all  maternal  deaths  in  the  U.S.   •  Maternal  death  in  56  out  of  every  100,000  live  births  in  US   •  Neonatal  death  in  71  out  of  every  100,000  live  births  in  US   •  Cost   •  $20  billion  in  direct  costs  in  the  U.S  annually   •  Average  hospital  stay  of  3.5  days   Linda  Liu   Ma;  Cooper   Bruce  Ling  
  • 14.
  • 15. New  markers  for  preeclampsia   p  value   3.49  X  10-­‐4  1.79  X  10-­‐5   ng/ml   p  value  =  1.92  X  10-­‐8   Control   N=16   Preeclampsia   N=15   Control   N=16   Preeclampsia   N=17   GA  23-­‐34  weeks   GA  >  34  weeks   ng/ml   Gesta3onal  age  (weeks)   march of dimes® prematurity research center VERSION: MOD_PRC_LOGO_R7G_082712 at STANFORD University School of Medicine Linda  Liu   Bruce  Ling  
  • 16.
  • 17. Sequencing  Excitement   •  454/Roche,  Life  Technologies   •  Helicos:  $30k  genome   •  Pacific  Biosystems:  sequence   human  genome  in  15  minutes   •  Run  'mes  in  minutes     at  a  cost  of  hundreds  of  dollars   •  Complete  Genomics:   80  genomes/day   •  Ion  Torrent    and   Illumina:  ~$1500  per     genome   •  Oxford:  USB  s'ck  
  • 18. Lancet,  375:1525,  May  1,  2010.    
  • 19. Credit:  Euan  Ashley,  Russ  Altman,  Steve  Quake,  Lancet  
  • 20. •  Study  published  in  2008  in   Inflammatory  Bowel   Disease   •  Crohn’s  Disease  and   Ulcera've  Coli's   •  Inves'gated  9  loci  in  700   Finnish  IBD  pa'ents   •  We  record  100+  items   –  GWAS,  non-­‐GWAS  papers   –  Disease,  Phenotype   –  Popula'on,  Gender   –  Alleles  and  Genotypes   –  p-­‐value  (and  confidence)   –  Odds  ra'o  (and  confidence)   –  Technology,  Study  design   –  Gene'c  model   •  Mapped  to  UMLS  concepts  Rong  Chen   Optra  Systems  
  • 21. •  Study  published  in  2008  in   Inflammatory  Bowel   Disease   •  Crohn’s  Disease  and   Ulcera've  Coli's   •  Inves'gated  9  loci  in  700   Finnish  IBD  pa'ents   •  We  record  100+  items   –  GWAS,  non-­‐GWAS  papers   –  Disease,  Phenotype   –  Popula'on,  Gender   –  Alleles  and  Genotypes   –  p-­‐value  (and  confidence)   –  Odds  ra'o  (and  confidence)   –  Technology,  Study  design   –  Gene'c  model   •  Mapped  to  UMLS  concepts  
  • 22. •  Study  published  in   2009  in   Rheumatology   •  Ankylosing   spondyli's   •  Inves'gated  8   SNPs  in  IL23R  in   2000  UK  case-­‐ control  pa'ents   •  Tables  can  be  rotated   •  NLP  is  hard  
  • 23. •  Study  published  in   2009  in   Rheumatology   •  Ankylosing   spondyli's   •  Inves'gated  8   SNPs  in  IL23R  in   2000  UK  case-­‐ control  pa'ents   •  Tables  can  be  rotated   •  NLP  is  hard  
  • 24. •  Study  published  in   2009  in   Rheumatology   •  Ankylosing   spondyli's   •  Inves'gated  8   SNPs  in  IL23R  in   2000  UK  case-­‐ control  pa'ents   •  Tables  can  be  rotated   •  NLP  is  hard  
  • 25.
  • 26. What  are  the  alleles  for  rs1004819?  
  • 27. Alleles  for  rs1004819  are  C  and  T   ~11%  of  records  reported  genotypes  in  the  nega3ve  strand  
  • 28.
  • 29.
  • 30.
  • 31. Number  of   papers   curated   Number  of   records   Dis3nct  SNPs   Diseases  and   phenotypes   ~19,000   ~1.6  million   ~473,000   ~7,400   Rong  Chen   Anil  Patwardhan   Michael  Clark   Optra  Systems   Personalis   VARIMED:  Variants  Informing  Medicine   Chen  R,  Davydov  EV,  Sirota  M,  Bu;e  AJ.     PLoS  One.     2010  October:  5(10):  e13574.  
  • 32.
  • 33. Diseases  and  Traits   • Risk  factors  are  associated  with  an  increased  likelihood  of   developing  a  given  diseases   •  Smoking  à  chronic  obstruc've  pulmonary  disease   • Risk  factors  are  iden'fied  for  diseases  through  large  scale   epidemiological  studies,  which  are  resource  intensive   • GWAS  have  iden'fied  gene'c  variants  for  thousands  of   diseases  and  traits   • If  traits  and  diseases  share  the  same  associated  gene'c   variants,  could  the  trait  be  used  to  suggest  risk  factors  for   disease?   Li  L,  Ruau  DJ,  Patel  CJ,  Weber  SC,  Chen  R,  Tatonej  NP,  Dudley  JT,  Bu;e  AJ.     Science  Transla3onal  Medicine,  2014,  6(234).   Li  Li  
  • 34. EMR Cohort Identify significant disease-trait genetic associations and clinically validate using EMR data Gene counts > 3 Disease (n=201) Varimed   TF-IDF weighing Cosine distance Random shuffling Trait (n=85) Disease (n=69) Trait (n=249) Disease-Trait Pair (n=120) p < 1e-8 Disease modules (n=8) Gene3cs  Module   Clinical  Valida3on   Novel predictions (n=26) T q ≤ 0.01 D Published findings (n=94) T D Trait modules (n=7) Complications Diagnostic tests Risk factors 1st dx After dxBefore dx 1st dx Li  Li  
  • 35. Assessing  significance  of  disease-­‐trait  (D-­‐T)  pair   •  Each  gene  within  individual  disease  or  trait  by  taking  into  account  the   frequency  of  the  gene:  Term  Frequency–Inverse  Document  Frequency   •  2-­‐idf(i,  j)  =  2(i,  j)  ×  idfi,  =  ni,  j/(∑k  nk,  j)  x  log(D/Di)  which  adjusted  the  score  of  6(i,  j)  by  taking  into   account  the  popularity  level  of  the  gene  i.     •  e.g,  154  D+T,  28  genes  in  Alzheimer's  disease  and  5  genes  in  ESR,  CR1  was  in  common   •  s-­‐idf  (AD)=1/28  x  log(154/2,10)=0.067   •  s-­‐idf  (ESR)=1/5  x  log(154/2,10)=0.377   •  D-­‐T  distance  score  was  calculated  using  Cosine  distance  to  evaluate   similarity  between  all  pairs.   •  Randomly  sampling  all  the  genes  across  all  the  traits,  and  calculated  the  D-­‐T   similarity,  repeated  1,000  'mes  and  generated  the  q  value  based  on  the   number  of  the  samplings.   ∑∑ ∑ == = × × = • =− n i i n i i n i ii TD TD TD TD TDsimilarityine 1 2 1 2 1 )()( ),(cos =  0.9274524   Li  L,  Ruau  DJ,  Patel  CJ,  Weber  SC,  Chen  R,  Tatonej  NP,  Dudley  JT,  Bu;e  AJ.     Science  Transla>onal  Medicine,  2014,  6(234).   Li  Li  
  • 38. Categoriza3ons  for  known  D-­‐T  pairs  and  discover  poten3al   confounders  in  GWAS  studies   38 pairs 27 pairs 28 pairs 93 pairs T D Gene3c  Variants   TD Gene3c  Variants   Timing  of  Disease  Progression   Risk  Factor   Consequence   T D Gene3c  Variants   Diagnos3c  Test   Li  Li  
  • 39. Diagnos3c  tests  where  traits  occur  at  the  same  3me  as  disease   onset   An3body  3ter   Hepa<<s  B  vaccine  response   Png  et  al,  Hum  Mol  Genet,  2011   Even  though  this  GWAS  did  not  explicitly  par'cipants  with  the  autoimmune  diseases  above,  our   approach  inferred  known  rela'onships  between  diseases  and  traits  based  on  their  shared  gene'c   architecture     T D Gene3c  Variants   Diagnos3c  Test   Li  Li  
  • 40. Significant  genes  shared  between  an3body  3ter  and     16  autoimmune  diseases   Disease   Common  Genes   Genes  Shared   q-­‐value   Alopecia  areata   4   BTNL2;  C6orf10;  RDBP;  TNXB   <0.001   Ankylosing  spondyli's   2   BTNL2;  LOC100507436   0.001   Asthma   4   BTNL2;  C6orf10;  HLA-­‐DPA1;  NOTCH4;   <0.001   Biliary  liver  cirrhosis   3   BTNL2;  C6orf10;  HLA-­‐DPB1   0.003   Chronic  hepa''s  b   2   HLA-­‐DPA1;  HLA-­‐DPB1   <0.001   HIV  infec'on   7   C6orf10;  HLA-­‐C;  LOC100507436;  NOTCH4;  PRRC2A;  RDBP;  TNXB   <0.001   Membranous  nephropathy   15   AGPAT1;  BAG6;  BTNL2;  C6orf10;  EHMT2;  GPANK1;  LY6G5B;  LY6G6C;  NOTCH4;   PRRC2A;  RDBP;  RNF5;  SLC44A4;  TNXB;  ZBTB12   <0.001   Mul'ple  sclerosis   7   AGPAT1;  BAG6;  BTNL2;  C6orf10;  EHMT2;  NOTCH4;  TNXB   <0.001   Neonatal  lupus   3   BAG6;  C6orf10;  ZBTB12   <0.001   Primary  biliary  cirrhosis   3   BTNL2;  C6orf10;  HLA-­‐DPB1   0.005   Rheumatoid  arthri's   20   AGPAT1;  BAG6;  BTNL2;  C6orf10;  EHMT2;  GPANK1;  HLA-­‐C;  HLA-­‐DPA1;  HLA-­‐DPB1;   LOC100507436;  LY6G5B;  LY6G6C;  LY6G6F;  NOTCH4;  PRRC2A;  RDBP;  RNF5;   SLC44A4;  TNXB;  ZBTB12   <0.001   Systemic  lupus   erythematosus   9   BAG6;  BTNL2;  C6orf10;  GPANK1;  HLA-­‐DPB1;  NOTCH4;  PRRC2A;  TNXB;  ZBTB12   <0.001   Systemic  sclerosis   3   HLA-­‐DPA1;  HLA-­‐DPB1;  NOTCH4   <0.001   Type  1  diabetes   5   BAG6;  BTNL2;  C6orf10;  HLA-­‐C;  HLA-­‐DPB1   0.001   Vi'ligo   6   AGPAT1;  BTNL2;  NOTCH4;  RNF5;  SLC44A4;  TNXB   <0.001   Wegener's  granulomatosis   2   HLA-­‐DPA1;  HLA-­‐DPB1   <0.001   Li  Li  
  • 41. Risk  factors  where  traits  occur  prior  to  the  disease  onset  and  may   accompany  disease   Trait   Disease   Common  Genes   Genes  Shared   q-­‐value   Smoking   Chronic  obstruc've  pulmonary  disease   3   AGPHD1;  CHRNA3;  RAB4B   <0.001   Gene3cs  Variants   Known  clinical  study:  Smoking  is  the  primary  risk  factor  for   COPD  although  lixle  was  known  the  pathogenesis  between   smoking  and  COPD.  Pauwels  et  al,  2001,  Vestbo  et  al  2012     In  GWAS  study:  Six  GWAS  studies  are  related  to  COPD  in   VARIMED  and  their  COPD  cohorts  all  are  from  smoking   pa'ents.    Cho  et  al,  2012,  Pillai  SG,  2010,  Wang  et  al  2010,  Cho  et  al,  2010,   lambrechts  et  al,  2010,  Pillai  SG,  2009     As  COPD  occurs  ayer  smoking,  the  variants  associated  with   COPD  could  be  influenced  by  smoking,  and  the  gene'c   variants  for  COPD  could  be  unmasked  if  smoking  confounder   is  excluded  in  GWAS.   Smoking   COPD   Li  Li  
  • 42. Gene3c  Variants   Consequence  where  traits  occur  aqer  the  disease  onset   Trait   Common  Genes   Genes  Shared   q-­‐value   Alanine  aminotransferase  levels   1   C12orf51   0.001   Cholesterol  levels   3   ALDH2;  BRAP;  C12orf51   0.001   HDL  cholesterol  levels   2   C12orf51;  OAS3   <0.001   Known  clinical  study:  High  HDL  criterion  was  observed  with   triple  frequency  in  the  ADS  group,  high  cholesterol  diet  was   associated  with  ADS  pa'ents  ,  and    ALT  levels  have  been   seen  to  increase  with  daily  alcohol  intake  in  pa'ents  who   developed  ADS.  Kahl  et  al,  2010;  imhof  et  al,  2001,  Gross  GA,  1994     In  GWAS  study:  3  genes  for  cholesterol  levels  reported  by   Kato  et  al.  and  2  genes  for  ALT  and  HDL-­‐C  reported  by  Young   et  al.    could  be  biased  by  alcohol  effect  as  the  authors  did  not   perform  alcohol  intake  adjustment  or  controlled  for  drinking   habits  on  these  genes  in  their  GWAS  studies.  Kato  et  al,  2011;   Kamatani  et  al,  2010     The  GWAS  to  iden'fy  concrete  gene'c  variants  for  these   three  clinical  measurements  should  be  performed  in  pa'ents   without  ADS  as  a  confounder   Alcohol  dependence  syndrome     (ADS)   ALT   HDL-­‐C   ADS   Li  Li  
  • 43. 27  novel  pairs   Trait   Disease   Common   Genes   Genes  Shared   q-­‐value   Mean  corpuscular  volume   Acute  lymphoblas3c  leukemia   1   IKZF1   0.001   Mean  cell  hemoglobin  concentra3on   Alcohol  dependence   1   ALDH2   0.005   Platelet  count   Alcohol  dependence   1   C12orf51   0.007   Lung  func'on   Alopecia  areata   1   AGER   0.008   Erythrocyte  sedimenta3on  rate   Alzheimer's  disease   1   CR1   0.004   Prostate-­‐Specific  an'gen  levels   Basal  cell  carcinoma   1   CLPTM1L   0.004   Eye  color   Chronic  lymphocy'c  leukemia   1   IRF4   0.006   Freckles   Chronic  lymphocy'c  leukemia   1   IRF4   0.008   Blood  pressure   Esophageal  cancer   3   ALDH2,  C12orf51,  PLCE1   0.009   Factor  vii  coagulant  ac'vity   Esophageal  cancer   1   ADH4   0.008   Serum  magnesium  levels   Gastric  cancer   3   MUC1;  THBS3;  TRIM46   <0.001   Prostate-­‐Specific  an'gen  levels   Glioma   1   TERT   0.005   Alpha  linolenic  acid  levels   Glucose  intolerance   1   FADS1   0.01   Alanine  aminotransferase  levels   Hypertension   1   C12orf51   0.003   Serum  transferrin  levels   Hypertension   1   HFE   0.005   Smoking   Kawasaki  disease   1   RAB4B   0.003   Prostate-­‐Specific  an'gen  levels   Lung  cancer   2   CLPTM1L;  TERT   0.001   Homocysteine  levels   Melanoma   1   C16orf55   0.01   Protein  c  levels   Melanoma   2   NCOA6;  PIGU   <0.001   Transferrin  receptor  levels   Metabolic  syndrome   3   APOA5;  BUD13;  ZNF259   <0.001   PR  interval   Open-­‐Angle  glaucoma   1   CAV1   0.002   PR  interval   Restless  legs  syndrome   1   MEIS1   0.003   Bone  mineral  density   Sudden  cardiac  arrest   1   ESR1   0.006   Acenocoumarol  maintenance  dosage  Systemic  lupus  erythematosus   2   ITGAM;  ITGAX   0.004   Platelet  count   Tes'cular  cancer   1   BAK1   0.003   Prostate-­‐Specific  an'gen  levels   Tes'cular  cancer   2   CLPTM1L;  TERT   <0.001   Alkaline  phosphatase  levels   Venous  thromboembolism   1   ABO   0.008   Li  Li  
  • 44. Independent  pa3ent  cohort  valida3on:  clinical  data  warehouses   •  STRIDE:  clinical  data  warehouse,  has  ICD9  diagnoses  codes,  CPT  procedure   codes,  and  lab  results  on  over  1.7  million  pediatric  and  adult  pa'ents  at   Stanford  Hospital  and  Clinic,  independent  cohort   1/1/2005  to  7/15/2012   •  Collabora'ons  also  with  Columbia  University  and  Mount  Sinai  School  of   Medicine  to  validate  findings   •  Time  frame  for  analysis:  within  one  year  before  the  1st  disease  diagnosis  or   within  one  year  ayer  the  1st  disease  diagnosis   1st Dx Target  disease  (case)   Non-­‐target  disease  (control)   lab lab 1 year 1 year Li  Li  
  • 45. Serum  magnesium  levels  and  gastric  cancer   Li  Li  
  • 46.
  • 48. Digital  compara3ve   effec3veness   Find  precision  subsets   If  entry  criteria  are  same,  outcome   measures  are  same,  and  comparable   studies,  can  perform  “meta-­‐trial”  
  • 49.
  • 50. Take  Home  Points   •  Personalized  medicine    ≥ DNA.    Will  include  other   clinical,  molecular,  and  environment  measures.   •  We  need  new  inves'gators  who  can  imagine  basic   ques'ons  to  ask  of  these  repositories  of  clinical   and  genomic  measurements.   •  Bioinforma'cs  is  not  just  about  building  tools.     We  know  our  tools;  we  should  use  them  first.   Don’t  be  afraid  to  test  your  ideas.  
  • 51. Funded  post-­‐doctoral   posi3ons  in   Transla3onal   Bioinforma3cs       Contact  Atul  Bu;e   abu;e@stanford.edu  
  • 52. Collaborators   •  Jeff  Wiser,  Patrick  Dunn,  Mike  Atassi  /  Northrop  Grumman   •  Ashley  Xia  and  Quan  Chen  /  NIAID   •  Takashi  Kadowaki,  Momoko  Horikoshi,  Kazuo  Hara,  Hiroshi  Ohtsu  /  U  Tokyo   •  Kyoko  Toda,  Satoru  Yamada,  Junichiro  Irie  /  Kitasato  Univ  and  Hospital   •  Shiro  Maeda  /  RIKEN   •  Alejandro  Sweet-­‐Cordero,  Julien  Sage  /  Pediatric  Oncology   •  Mark  Davis,  C.  Garrison  Fathman  /  Immunology   •  Russ  Altman,  Steve  Quake  /  Bioengineering   •  Euan  Ashley,  Joseph  Wu,  Tom  Quertermous  /  Cardiology   •  Mike  Snyder,  Carlos  Bustamante,  Anne  Brunet  /  Gene'cs   •  Jay  Pasricha  /  Gastroenterology   •  Rob  Tibshirani,  Brad  Efron  /  Sta's'cs   •  Hannah  Valan'ne,  Kiran  Khush/  Cardiology   •  Ken  Weinberg  /  Pediatric  Stem  Cell  Therapeu'cs   •  Mark  Musen,  Nigam  Shah  /  Na'onal  Center  for  Biomedical  Ontology   •  Minnie  Sarwal  /  Nephrology   •  David  Miklos  /  Oncology  
  • 53. Support   •  Lucile  Packard  Founda'on  for  Children's  Health   •  NIH:  NIAID,  NLM,  NIGMS,  NCI;  NIDDK,  NHGRI,  NIA,  NHLBI,  NCATS   •  March  of  Dimes   •  Hewlex  Packard   •  Howard  Hughes  Medical  Ins'tute   •  California  Ins'tute  for  Regenera've  Medicine   •  Luke  Evnin  and  Deann  Wright  (Scleroderma  Research  Founda'on)   •  Clayville  Research  Fund   •  PhRMA  Founda'on   •  Stanford  Cancer  Center,  Bio-­‐X,  SPARK   •  Tarangini  Deshpande   •  Alan  Krensky,  Harvey  Cohen   •  Hugh  O’Brodovich   •  Isaac  Kohane   Admin  and  Tech  Staff   •  Susan  Aptekar   •  Jen  Cory   •  Boris  Oskotsky