Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Big Data in Biomedicine: Translating 300 Trillion Points of Data
1. Big
Data
in
Biomedicine:
Transla3ng
300
trillion
points
of
data
into
new
drugs
and
diagnos3cs
Atul
Bu;e,
MD,
PhD
Chief,
Division
of
Systems
Medicine,
Departments
of
Pediatrics,
Gene3cs,
and,
by
courtesy,
Computer
Science,
Pathology,
and
Medicine
Center
for
Pediatric
Bioinforma3cs,
LPCH
Stanford
University
abu;e@stanford.edu
@atulbu;e
@ImmPortDB
13. Preeclampsia:
large
cause
of
maternal
and
fetal
death
• Incidence
• 5-‐8%
of
all
pregnancies
in
the
U.S.
and
worldwide
• 4.1
million
births
in
the
U.S.
in
2009
• Up
to
300K
cases
of
preeclampsia
annually
in
the
U.S.
• Mortality
• Responsible
for
18%
of
all
maternal
deaths
in
the
U.S.
• Maternal
death
in
56
out
of
every
100,000
live
births
in
US
• Neonatal
death
in
71
out
of
every
100,000
live
births
in
US
• Cost
• $20
billion
in
direct
costs
in
the
U.S
annually
• Average
hospital
stay
of
3.5
days
Linda
Liu
Ma;
Cooper
Bruce
Ling
14.
15. New
markers
for
preeclampsia
p
value
3.49
X
10-‐4
1.79
X
10-‐5
ng/ml
p
value
=
1.92
X
10-‐8
Control
N=16
Preeclampsia
N=15
Control
N=16
Preeclampsia
N=17
GA
23-‐34
weeks
GA
>
34
weeks
ng/ml
Gesta3onal
age
(weeks)
march of dimes®
prematurity research center
VERSION: MOD_PRC_LOGO_R7G_082712
at STANFORD University School of Medicine
Linda
Liu
Bruce
Ling
16.
17. Sequencing
Excitement
• 454/Roche,
Life
Technologies
• Helicos:
$30k
genome
• Pacific
Biosystems:
sequence
human
genome
in
15
minutes
• Run
'mes
in
minutes
at
a
cost
of
hundreds
of
dollars
• Complete
Genomics:
80
genomes/day
• Ion
Torrent
and
Illumina:
~$1500
per
genome
• Oxford:
USB
s'ck
20. • Study
published
in
2008
in
Inflammatory
Bowel
Disease
• Crohn’s
Disease
and
Ulcera've
Coli's
• Inves'gated
9
loci
in
700
Finnish
IBD
pa'ents
• We
record
100+
items
– GWAS,
non-‐GWAS
papers
– Disease,
Phenotype
– Popula'on,
Gender
– Alleles
and
Genotypes
– p-‐value
(and
confidence)
– Odds
ra'o
(and
confidence)
– Technology,
Study
design
– Gene'c
model
• Mapped
to
UMLS
concepts
Rong
Chen
Optra
Systems
21. • Study
published
in
2008
in
Inflammatory
Bowel
Disease
• Crohn’s
Disease
and
Ulcera've
Coli's
• Inves'gated
9
loci
in
700
Finnish
IBD
pa'ents
• We
record
100+
items
– GWAS,
non-‐GWAS
papers
– Disease,
Phenotype
– Popula'on,
Gender
– Alleles
and
Genotypes
– p-‐value
(and
confidence)
– Odds
ra'o
(and
confidence)
– Technology,
Study
design
– Gene'c
model
• Mapped
to
UMLS
concepts
22. • Study
published
in
2009
in
Rheumatology
• Ankylosing
spondyli's
• Inves'gated
8
SNPs
in
IL23R
in
2000
UK
case-‐
control
pa'ents
• Tables
can
be
rotated
• NLP
is
hard
23. • Study
published
in
2009
in
Rheumatology
• Ankylosing
spondyli's
• Inves'gated
8
SNPs
in
IL23R
in
2000
UK
case-‐
control
pa'ents
• Tables
can
be
rotated
• NLP
is
hard
24. • Study
published
in
2009
in
Rheumatology
• Ankylosing
spondyli's
• Inves'gated
8
SNPs
in
IL23R
in
2000
UK
case-‐
control
pa'ents
• Tables
can
be
rotated
• NLP
is
hard
27. Alleles
for
rs1004819
are
C
and
T
~11%
of
records
reported
genotypes
in
the
nega3ve
strand
28.
29.
30.
31. Number
of
papers
curated
Number
of
records
Dis3nct
SNPs
Diseases
and
phenotypes
~19,000
~1.6
million
~473,000
~7,400
Rong
Chen
Anil
Patwardhan
Michael
Clark
Optra
Systems
Personalis
VARIMED:
Variants
Informing
Medicine
Chen
R,
Davydov
EV,
Sirota
M,
Bu;e
AJ.
PLoS
One.
2010
October:
5(10):
e13574.
32.
33. Diseases
and
Traits
• Risk
factors
are
associated
with
an
increased
likelihood
of
developing
a
given
diseases
• Smoking
à
chronic
obstruc've
pulmonary
disease
• Risk
factors
are
iden'fied
for
diseases
through
large
scale
epidemiological
studies,
which
are
resource
intensive
• GWAS
have
iden'fied
gene'c
variants
for
thousands
of
diseases
and
traits
• If
traits
and
diseases
share
the
same
associated
gene'c
variants,
could
the
trait
be
used
to
suggest
risk
factors
for
disease?
Li
L,
Ruau
DJ,
Patel
CJ,
Weber
SC,
Chen
R,
Tatonej
NP,
Dudley
JT,
Bu;e
AJ.
Science
Transla3onal
Medicine,
2014,
6(234).
Li
Li
34. EMR Cohort
Identify significant disease-trait genetic associations
and clinically validate using EMR data
Gene counts > 3
Disease
(n=201)
Varimed
TF-IDF weighing
Cosine distance
Random shuffling
Trait
(n=85)
Disease
(n=69)
Trait
(n=249)
Disease-Trait Pair
(n=120)
p < 1e-8
Disease modules (n=8)
Gene3cs
Module
Clinical
Valida3on
Novel predictions
(n=26)
T
q ≤ 0.01
D
Published findings
(n=94)
T D
Trait modules (n=7)
Complications
Diagnostic tests
Risk factors
1st dx
After dxBefore dx
1st dx
Li
Li
35. Assessing
significance
of
disease-‐trait
(D-‐T)
pair
• Each
gene
within
individual
disease
or
trait
by
taking
into
account
the
frequency
of
the
gene:
Term
Frequency–Inverse
Document
Frequency
• 2-‐idf(i,
j)
=
2(i,
j)
×
idfi,
=
ni,
j/(∑k
nk,
j)
x
log(D/Di)
which
adjusted
the
score
of
6(i,
j)
by
taking
into
account
the
popularity
level
of
the
gene
i.
• e.g,
154
D+T,
28
genes
in
Alzheimer's
disease
and
5
genes
in
ESR,
CR1
was
in
common
• s-‐idf
(AD)=1/28
x
log(154/2,10)=0.067
• s-‐idf
(ESR)=1/5
x
log(154/2,10)=0.377
• D-‐T
distance
score
was
calculated
using
Cosine
distance
to
evaluate
similarity
between
all
pairs.
• Randomly
sampling
all
the
genes
across
all
the
traits,
and
calculated
the
D-‐T
similarity,
repeated
1,000
'mes
and
generated
the
q
value
based
on
the
number
of
the
samplings.
∑∑
∑
==
=
×
×
=
•
=−
n
i i
n
i i
n
i ii
TD
TD
TD
TD
TDsimilarityine
1
2
1
2
1
)()(
),(cos
=
0.9274524
Li
L,
Ruau
DJ,
Patel
CJ,
Weber
SC,
Chen
R,
Tatonej
NP,
Dudley
JT,
Bu;e
AJ.
Science
Transla>onal
Medicine,
2014,
6(234).
Li
Li
38. Categoriza3ons
for
known
D-‐T
pairs
and
discover
poten3al
confounders
in
GWAS
studies
38 pairs 27 pairs 28 pairs
93 pairs
T D
Gene3c
Variants
TD
Gene3c
Variants
Timing
of
Disease
Progression
Risk
Factor
Consequence
T
D
Gene3c
Variants
Diagnos3c
Test
Li
Li
39. Diagnos3c
tests
where
traits
occur
at
the
same
3me
as
disease
onset
An3body
3ter
Hepa<<s
B
vaccine
response
Png
et
al,
Hum
Mol
Genet,
2011
Even
though
this
GWAS
did
not
explicitly
par'cipants
with
the
autoimmune
diseases
above,
our
approach
inferred
known
rela'onships
between
diseases
and
traits
based
on
their
shared
gene'c
architecture
T
D
Gene3c
Variants
Diagnos3c
Test
Li
Li
41. Risk
factors
where
traits
occur
prior
to
the
disease
onset
and
may
accompany
disease
Trait
Disease
Common
Genes
Genes
Shared
q-‐value
Smoking
Chronic
obstruc've
pulmonary
disease
3
AGPHD1;
CHRNA3;
RAB4B
<0.001
Gene3cs
Variants
Known
clinical
study:
Smoking
is
the
primary
risk
factor
for
COPD
although
lixle
was
known
the
pathogenesis
between
smoking
and
COPD.
Pauwels
et
al,
2001,
Vestbo
et
al
2012
In
GWAS
study:
Six
GWAS
studies
are
related
to
COPD
in
VARIMED
and
their
COPD
cohorts
all
are
from
smoking
pa'ents.
Cho
et
al,
2012,
Pillai
SG,
2010,
Wang
et
al
2010,
Cho
et
al,
2010,
lambrechts
et
al,
2010,
Pillai
SG,
2009
As
COPD
occurs
ayer
smoking,
the
variants
associated
with
COPD
could
be
influenced
by
smoking,
and
the
gene'c
variants
for
COPD
could
be
unmasked
if
smoking
confounder
is
excluded
in
GWAS.
Smoking
COPD
Li
Li
42. Gene3c
Variants
Consequence
where
traits
occur
aqer
the
disease
onset
Trait
Common
Genes
Genes
Shared
q-‐value
Alanine
aminotransferase
levels
1
C12orf51
0.001
Cholesterol
levels
3
ALDH2;
BRAP;
C12orf51
0.001
HDL
cholesterol
levels
2
C12orf51;
OAS3
<0.001
Known
clinical
study:
High
HDL
criterion
was
observed
with
triple
frequency
in
the
ADS
group,
high
cholesterol
diet
was
associated
with
ADS
pa'ents
,
and
ALT
levels
have
been
seen
to
increase
with
daily
alcohol
intake
in
pa'ents
who
developed
ADS.
Kahl
et
al,
2010;
imhof
et
al,
2001,
Gross
GA,
1994
In
GWAS
study:
3
genes
for
cholesterol
levels
reported
by
Kato
et
al.
and
2
genes
for
ALT
and
HDL-‐C
reported
by
Young
et
al.
could
be
biased
by
alcohol
effect
as
the
authors
did
not
perform
alcohol
intake
adjustment
or
controlled
for
drinking
habits
on
these
genes
in
their
GWAS
studies.
Kato
et
al,
2011;
Kamatani
et
al,
2010
The
GWAS
to
iden'fy
concrete
gene'c
variants
for
these
three
clinical
measurements
should
be
performed
in
pa'ents
without
ADS
as
a
confounder
Alcohol
dependence
syndrome
(ADS)
ALT
HDL-‐C
ADS
Li
Li
44. Independent
pa3ent
cohort
valida3on:
clinical
data
warehouses
• STRIDE:
clinical
data
warehouse,
has
ICD9
diagnoses
codes,
CPT
procedure
codes,
and
lab
results
on
over
1.7
million
pediatric
and
adult
pa'ents
at
Stanford
Hospital
and
Clinic,
independent
cohort
1/1/2005
to
7/15/2012
• Collabora'ons
also
with
Columbia
University
and
Mount
Sinai
School
of
Medicine
to
validate
findings
• Time
frame
for
analysis:
within
one
year
before
the
1st
disease
diagnosis
or
within
one
year
ayer
the
1st
disease
diagnosis
1st Dx
Target
disease
(case)
Non-‐target
disease
(control)
lab lab
1 year 1 year
Li
Li
48. Digital
compara3ve
effec3veness
Find
precision
subsets
If
entry
criteria
are
same,
outcome
measures
are
same,
and
comparable
studies,
can
perform
“meta-‐trial”
49.
50. Take
Home
Points
• Personalized
medicine
≥ DNA.
Will
include
other
clinical,
molecular,
and
environment
measures.
• We
need
new
inves'gators
who
can
imagine
basic
ques'ons
to
ask
of
these
repositories
of
clinical
and
genomic
measurements.
• Bioinforma'cs
is
not
just
about
building
tools.
We
know
our
tools;
we
should
use
them
first.
Don’t
be
afraid
to
test
your
ideas.
52. Collaborators
• Jeff
Wiser,
Patrick
Dunn,
Mike
Atassi
/
Northrop
Grumman
• Ashley
Xia
and
Quan
Chen
/
NIAID
• Takashi
Kadowaki,
Momoko
Horikoshi,
Kazuo
Hara,
Hiroshi
Ohtsu
/
U
Tokyo
• Kyoko
Toda,
Satoru
Yamada,
Junichiro
Irie
/
Kitasato
Univ
and
Hospital
• Shiro
Maeda
/
RIKEN
• Alejandro
Sweet-‐Cordero,
Julien
Sage
/
Pediatric
Oncology
• Mark
Davis,
C.
Garrison
Fathman
/
Immunology
• Russ
Altman,
Steve
Quake
/
Bioengineering
• Euan
Ashley,
Joseph
Wu,
Tom
Quertermous
/
Cardiology
• Mike
Snyder,
Carlos
Bustamante,
Anne
Brunet
/
Gene'cs
• Jay
Pasricha
/
Gastroenterology
• Rob
Tibshirani,
Brad
Efron
/
Sta's'cs
• Hannah
Valan'ne,
Kiran
Khush/
Cardiology
• Ken
Weinberg
/
Pediatric
Stem
Cell
Therapeu'cs
• Mark
Musen,
Nigam
Shah
/
Na'onal
Center
for
Biomedical
Ontology
• Minnie
Sarwal
/
Nephrology
• David
Miklos
/
Oncology
53. Support
• Lucile
Packard
Founda'on
for
Children's
Health
• NIH:
NIAID,
NLM,
NIGMS,
NCI;
NIDDK,
NHGRI,
NIA,
NHLBI,
NCATS
• March
of
Dimes
• Hewlex
Packard
• Howard
Hughes
Medical
Ins'tute
• California
Ins'tute
for
Regenera've
Medicine
• Luke
Evnin
and
Deann
Wright
(Scleroderma
Research
Founda'on)
• Clayville
Research
Fund
• PhRMA
Founda'on
• Stanford
Cancer
Center,
Bio-‐X,
SPARK
• Tarangini
Deshpande
• Alan
Krensky,
Harvey
Cohen
• Hugh
O’Brodovich
• Isaac
Kohane
Admin
and
Tech
Staff
• Susan
Aptekar
• Jen
Cory
• Boris
Oskotsky