whole genome sequencing new and its types including shortgun and clone by clone
NetBioSIG2014-Talk by Salvatore Loguercio
1. Network-augmented Genomic Analysis (NAGA)
Applied to Cystic Fibrosis studies
Salvatore Loguercio, Ph.D.
loguerci@scripps.edu
@sal99k
http://sulab.org
July 11, 2014
Network Biology SIG – ISMB 2014
2. Cystic fibrosis overview
• inherited recessive chronic disease -
chest infection, lung damage, and
bowel obstruction.
• 30,000 children and adults in the US
(70,000 worldwide); 1,000 new cases
diagnosed each year.
• Predicted median age of survival for a
person with CF: late 30s.
• Primary therapy: airway clearance
techniques (ACT)
Source: Cystic Fibrosis Foundation
3. CFTR and mucous flow
3
Source: http://www.flickr.com/photos/ajc1/3737955649
• Mutation cause the body to
produce unusually thick,
sticky mucus
• Clogs the lungs and leads to
life-threatening lung
infections
• Obstructs the pancreas and
stops natural enzymes from
helping the body break
down and absorb food
8. Target
1
2
3
I) Compute all shortest paths from siRNA hits to the
target through a weighted protein interaction network
(Dijstra algorithm)
II) Prioritize connecting proteins specific to the set
of high-scoring siRNA hits considered.
Connect siRNA hits to a target through the Human Interactome
2
2
9. I. Build integrated PPI network
II. Run Shortest Path analysis
III. Control for unrelated protein hubs
10. Publicly available interaction data:
From 10 source databases and 11 studies
14796 proteins
169625 interactions
Quality score [0:1] for each interaction, based on
experimental evidences*
*Source: Human Integrated Protein-
Protein Interaction reference (HIPPIE)
d = 9
Average path length: 3.6
I. Build a weighted protein interaction network – include MS data
+
Experimental interactome
(nodes + edges)
Updated scores, based on databases and experimental interactome
S(u,v) = 2 – Sexp – Sdb
Sexp=
1 if e(u,v) in exp
0
13. siRNA library
Randomly select a
subset of the same
size of the target set
shortest path
analysis
Repeat n times
Randomized “hubness”
For each connecting node
Target
Randomization – select proteins specific for the set of siRNA hits
For each protein connecting siRNA hits to the target, compute:
Nsp: number of distinct siRNA hits that utilize the protein on its shortest path to the target
Nrnd: randomized Nsp
p-value =
𝑠𝑢𝑚(𝑁 𝑟𝑛𝑑≥𝑁𝑠𝑝)
𝑙𝑒𝑛𝑔𝑡ℎ(𝑁 𝑟𝑛𝑑)
Nsp, Nrnd and the associated p-value are used to prioritize connecting proteins specific to
the set of siRNA hits considered
14. CFTR – PN connectors – first degree – real vs. randomized
Nsp ≥3Select:
Nsp ≥3
Nsp /Nrnd≥2
(12 proteins)
15. Assessing candidate regulators
15
42 candidate
regulators
31 previously
screened
11 novel
genes
22 (71%)
previously
identified as hits
8 (73%) validate
in de novo
experiments
16. Validation of predicted protein targets
siRNA screen
CFTR rescue of function
8/11 (73%) novel candidate regulators validate
x
x
x
17. Gene
Symbol
Solo vs.
MudPit
Vx809 vs.
MudPit
SRRM1 x
CDC5L x
NDKB x
TPR x
AIFM1 x
2ABB x
KPCD2 x
PLSCR1 x
MAP3K14 x
TFG x x
XRCC5 x x
CTNB1 x
XPO1 x
MCM7 x
WDR61 x
PP2AB x
H2AFX x
MYC x
Validation of predicted targets - Specificity
X: predicted
: validated
siRNA screen
CFTR rescue of function
New condition:
Vx-809 drug
18. X: predicted
: validated
siRNA screen
CFTR rescue of function
Validation of predicted targets - Coverage
Restrain flow through a
subset of direct interactors
Gene
Symbol
Solo vs.
MudPIT
(partial)
Solo vs.
MudPIT
(full)
Vx809 vs.
MudPIT
(full)
SRRM1 x x
EIF3L x
STAU1 x
CAN2 x
SNRPA x
AUP1 x
Good specificity
Sub-optimal coverage
19. Summary
• NAGA is a network-based method to integrate functional
genomics data (e.g. siRNA screens) with interactomics
datasets (e.g. AP-MS, MudPIT)
• Useful for prioritizing novel functional targets and for
identifying relevant network modules
• It leverages publicly available information on protein-protein
interactions and thus is readily applicable to many scenarios
where a connection between functional and biochemical
data is sought
• Good specificity, coverage to be improved
20. Contact
loguerci@scripps.edu
@sal99k
http://sulab.org
Andrew Su
Su Lab
William Balch
Darren Hutt
Daniela Roth
Chao Wang
Anita Pottekat
Sumit Chanda
Stephen Soon
Dieter Wolf
Trey Ideker
Anne Carvunis
Jean Wang
Daniel Quan
Travel funding to ISMB 2014 was generously
provided by NSF and the NetBio SIG committee
NetBio SIG