3. Our aims within the CHDI-UCLA JSCOur aims within the CHDI-UCLA JSC
●
Apply our systems-biological analysis methods to -omic data
from HD patients and animal models (both in-house and publicly
available data)
●
Identify gene networks involved in early HD pathogenesis, and
enable refining the gene networks by integrating further data
sets and functional studies
●
Cooperate with Yang lab in refining our gene networks based on
prior biological knowledge as well as on direct validation
●
Cooperate with Coppola group to store and disseminate our
results in a user-friendly manner
4. Talk roadmapTalk roadmap
●
Brief overview of Weighted Gene Co-expression Network
Analysis, WGCNA
●
WGCNA analysis of 6-month Allelic Series striatum
●
Preservation of striatum modules in other data sets
●
Consensus modules: modules present across multiple data sets
●
Consensus modules across publicly available human HD data
sets: insights into region-specific as well as common response
to HD pathology
●
Preservation of human HD-related modules in model organism
data
5. What is WGCNA?
• A compendium of methods to analyze “high-dimensional” data:
thousands of variables (gene expression, methylation,
proteomics, metabolomics, …) measured across multiple
samples (at least 20 but more is better)
• Aims: find disease-related biomarkers, identify candidate genes,
gain biological insights into what pathway may be associated
with the condition
6. What is different from other methods?
• “See the forest for the trees”
• Standard methods (differential gene expression etc) analyze each
gene in isolation (the trees)
• WGCNA: study how gene expressions behave together (the forest)...
• ...then use this information to make (more) sense of standard results
for individual genes
• Changes in networks across conditions can also provide important
biological insights
7. Constructing a weighted gene network
Bin Zhang and Steve Horvath (2005), A General Framework for Weighted Gene Co-
Expression Network Analysis, SAGMB 4 (1), Article 17
Peter Langfelder and Steve Horvath (2008), WGCNA: an R package for weighted
correlation network analysis. BMC Bioinformatics. 9:559
13. • A network can be represented by an adjacency matrix, A=[Aij], that
encodes whether/how a pair of nodes is connected.
Turning correlation into a networkTurning correlation into a network
?
=
15. Turning correlation into a networkTurning correlation into a network
?
• Suppress low correlations that may be due to noise
• Construct a “signed” network: negatively-correlated genes should be
unconnected
19. Modules in 6-month striatumModules in 6-month striatum
Blue (red): genes under- (over-)expressed in higher Q's
20. ●
Find several modules (2, 20) that appear to group together genes that
change consistently with Q, and the change appears to increase with
Q
21. Relating modules to genotypes or otherRelating modules to genotypes or other
sample informationsample information
• Genes in each module are correlated (similar), hence can be
summarized by a single expression profile
• Use a synthetic expression profile called the eigengene, obtained
from Singular Value Decomposition (SVD)
• Relate each eigengene to genotype
using standard statistical methods (t-
test, regression significance, etc.)
• Calculating p-values is straightforward
22. Relating modules to statusRelating modules to status viavia eigengeneseigengenes
Red (blue): positive (negative) correlation of module eigengene with genotype
Format: (100*correlation)|(negative exponent of p-value)
23. ●
M.2 (neuronal): down with increasing Q
●
M.20 (also neuronal): up with increasing Q (two separate populations of neurons?)
●
M.7 (oligodendrocyte), M.39 (mitochondria): up with increasing Q
●
M.11 (oligodendrocyte): up in moderate Q (80, 92), then down in high Q (140, 175)
●
Some modules with no strong enrichment also change strongly (M.10, M.41)
24. Eigengene "expression" as a function of QEigengene "expression" as a function of Q
for selected modulesfor selected modules
25. How to select most important genes?How to select most important genes?
• Network-based Target Prioritization Scheme (NTPS); eventually
should become part of Data To Target (D2T) online tool
• Combine multiple ways of selecting genes
– Individual association with genotype (consistent change
with increasing Q)
– Hub gene status in a biologically plausible module
(association with Q, enrichment, preservation in other data)
– Consistent association and hub gene status in related
tissues (e.g., cortex) and/or at other time points
– Additional statistical tests such as causal testing
– Prior knowledge (HD Target DB)
26. Network-based Target Prioritization Scheme (NTPS)Network-based Target Prioritization Scheme (NTPS)
• Current version combines gene-genotype and module information
from 6-month striatum and cortex
• The result is a ranked list of genes for further follow-up
• One can also sort genes based on purely striatum- or cortex-based
ranking; Top 10 overall ranked genes:
Gene Overall rank
Striatum
rank
Cortex rank
Striatum
module
Cortex
module
Arpp19 1 3 8 2 4
Arpp21 2 1 22 2 4
Ppp3ca 3 32 11 2 4
Rgs4 4 139 1 2 4
Chn1 5 9 38 2 4
Plcb1 6 144 4 2 4
Scn4b 7 20 50 2 2
Atp2b1 8 63 15 2 4
Slmap 9 30 68 2 38
Prkcb 10 18 77 2 4
29. Network analysis of miRNA-seq
from Allelic Series 6-month striatum
But wait... there's more!
Network analysis can be applied to any high-dimensional
data set
30. Network analysis of miRNA data from 6-month Striatum
●
miRNA expression patterns show strong modularity
●
Find modules 1,3,10 that appear to group together miRNAs strongly associated with Q
31. Association of miRNA modules with genotype
●
Modules 1,3,10 are strongly and consistently associated with Q; association gets
stronger with higher Q
32. Association of miRNA modules with genotype
●
Modules 1,3,10 show striking
patterns of Q-dependence;
other modules do not change
consistently
34. Integrate miRNA and mRNA data: association of expression
●
miRNA modules 1,3,10 correlate strongly with several Q-associated mRNA modules
●
Could similar associations be observed in comparing gene modules to groups of targets
of each miRNA module?
35. Overlap of gene modules and predicted targets of miRNA modules
●
Predicted targets of miRNA modules 1 and 4 are enriched in neuronal module 2
●
MiRNA module 1 is positively correlated with gene module 2, so the enrichment
probably does not reflect miRNA-based regulation
36. Network analysis can help answer manyNetwork analysis can help answer many
research questionsresearch questions
• Find genes and modules related to a clinical trait, for example
disease status, severity etc; find candidate intervention targets,
biomarkers or (with additional data) causal genes
• Integrate different data types when measured for the same set of
samples
• Study commonalities and differences in the organization of gene
expression between conditions or species
– Study consensus modules: modules that are present in all
compared expression data sets
– Quantify preservation of modules between a reference and a test
condition
37. LimitationsLimitations
• Co-expression networks hypothesize that co-expression is
biologically meaningful and important for the studied trait
• For selecting biomarkers: standard screening methods are often
better; network analysis provides better insight into possible function
• Many factors influence expression and co-expression, not all of
which are biological/desirable, and it is often difficult to disentangle
the individual factors
38. LimitationsLimitations
• Co-expression networks do not carry functional information
• ...nor information about causality
• Modules tend to be rather large because statisticians like robust
results (much to biologists' displeasure)
• Functional annotation for modules tends to be rather general
• For best results, analysis requires a skilled human operator
41. Preservation of modules identified in
Allelic Series striatum in other data
sets
Langfelder P, Luo R, Oldham MC, Horvath S,
Is My Network Module Preserved and Reproducible?
PloS Comp Biol 2011 7(1): e1001057
42.
One often has a "reference" data set: a data set with an
interesting finding, for example, module 2 associates strongly
with Q in Allelic Series striatum
Hypothesis: expression (and co-expression) of genes in module 2
reflects in some way the pathological processes connected to HD
in a mouse model
If the genes in module 2 were co-expressed in human data as
well, it would provide indirect evidence that the same or similar
process is also active in human HD
Question: is module 2 "preserved" in human data, i.e., are genes
in module 2 co-expressed in human data as well?
General formulation: given a module identified in a "reference"
data set, is it preserved in an independent "test" data set?
Rationale for studying module preservation
43.
Networks in general and WGCNA in particular provide several module
preservation statistics that measure various aspects of module
preservation
For convenience, we summarize the statistics in a single measure
called Zsummary. This works very much like a Z statistic:
– Zsummary < 2: no evidence of module preservation
– 2 < Zsummary < 7: weak to moderate evidence of preservation
– 7 < Zsummary: strong evidence of preservation
Quantifying module preservation:
The Zsummary statistic
44. How strongly are 6-month striatum modules preserved
in other data sets?
Zsummary < 2: no evidence of preservation
2 < Zsummary < 7: weak - moderate preservation
7 < Zsummary: strong preservation
45. • Most modules are preserved in CTX and other mouse brain data sets
• Oligodendrocyte M.7, M.11 appear preserved across most other data
• Neuronal M.2, M.20 preserved in some human data sets but in others
46. Integrating multiple expression data
sets: consensus module analysis
Langfelder P, Horvath S, Eigengene networks for studying the relationships between
co-expression modules. BMC Systems Biology 2007, 1:54
• Aim: find modules that are present in each of multiple independent
input data sets
47. ,
Network 1 Network 2
Finding consensus modules
Modules group together densely interconnected genes
Consensus modules group together genes densely connected in all
input data sets
Our solution: find a consensus gene-gene similarity and use it with
clustering to find modules
48. =min( ),
Calibrated
Network 1
Calibrated
Network 2
Consensus
Finding consensus modules
Modules group together densely interconnected genes
Consensus modules group together genes densely connected in all
input data sets
Our solution: find the consensus gene-gene similarity and use it with
clustering to find modules
51. 7 data sets7 data sets
• Hodges et al (2006): CN, BA4 (motor cortex), CB
• Durrenberger et al (2011), Common neuroinflammatory pathways in
neurodegenerative diseases. GEO series GSE26927, data from CN of
10 controls and 9 patients
• Zhang et al (2012): several hundred HD and control samples from
prefrontal cortex, visual cortex and cerebellum
52. Are individual gene changes in response toAre individual gene changes in response to
HD concordant across the data sets?HD concordant across the data sets?
• There is relatively good concordance (especially given the difference
in how strongly HD affects CN vs. Ctx and CB)
• Data sets from the same tissue tend to cluster together
55. • Neuronal module M.10 down across all brain regions
• Astrocyte M.11, microglial M.12, HSP M.15 up across all brain regions
• Oligodendrocyte M.7 and microglial M.43 up CN but not in CTX or CB
• Ribosomal M.17 shows no appreciable change
• M.14 with chaperone protein genes is down in CN but up in other areas
56. Are human consensus modules preservedAre human consensus modules preserved
in mouse data?in mouse data?
57. • Most modules are preserved in BA4 postmortem human data
• Neuronal M.10, M.23 are preserved in most mouse data sets
• Oligodendrocyte M.7 is preserved in some sets: allelic series, Strand data
• Astrocyte M.11, microglial M.12 mostly not preserved
58. Lessons from the consensus module
analysis of human data sets
Response to HD shows interesting regional commonalities as well as
differences
Consensus module analysis leads to robust modules that are unlikely
to reflect technical artifacts
One can pick interesting genes by high intramodular connectivity in
status-related consensus modules
59. Summary
Network analysis organizes genes into a smaller number of modules
that can be assigned functional labels
Network-based Target Prioritization Scheme ranks genes based on
multiple criteria and provides a ranked list of genes for further follow-up
Advanced network-based method allow integration of independent data
and a multi-layer analysis (e.g., brain region differences in response to
HD, differences in response to varying Q length at different time points
etc.)
60. Acknowledgments
Jeff Aaronson, Jim Rosinki and entire CHDI
Steve Horvath
William Yang and his group, esp. Jeff Cantle and Xiaohong Lu
Jim Wang and Mike Palazzolo
Giovanni Coppola and his group, esp. Doxa Chatzopoulou and
Charles Blum
61. Interested?
More about Weighted Gene Co-expression Network Analysis
(WGCNA):
http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/
WGCNA implementation for R: package WGCNA
http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/Rpackages/WGCNA/
Module preservation project main publication and page with
examples and tutorials:
Peter Langfelder, Luo Rui, Michael C. Oldham, and Steve Horvath
Is My Network Module Preserved and Reproducible? PloS Comp Biol 7(1): e1001057
http://genetics.ucla.edu/labs/horvath/CoexpressionNetwork/ModulePreservation/
Consensus modules and consensus eigengene networks:
Peter Langfelder and Steve Horvath, Eigengene networks for studying the relationships
between co-expression modules. BMC Systems Biology 2007, 1:54
http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/EigengeneNetwork
63. Step 5: Finding network modulesStep 5: Finding network modules
• Modules: groups (clusters) of strongly connected genes
• How to define “strong connection”? Simple answer “strong
connection = high adjacency” is susceptible to noise
• More complicated but also more robust answer: Topological overlap
(TOM): measures direct connection + shared neighbours
No shared neighbours: low TOM Many shared neighbours: high TOM
64. Eigengenes are useful in many contextsEigengenes are useful in many contexts
• Biological data reduction: instead of say 20,000 genes may have ~30
eigengenes that can be related to a clinical trait
– Alleviate the multiple testing problem
• Give rise to a continuous (“fuzzy”) measure of module membership
for all genes in all modules:
MMi=kMEi=cor xi , E
65. Judging modules by their Z scores
Each Z score provides answer to “Is the module significantly better
than a random sample of genes?”
Pool the Z scores into a summary (median) Z measure.
Z < 2 indicates no preservation, 2 < Z < 10 weak to moderate
evidence of preservation, Z > 10 strong evidence
Low Z means the “module” is no better than a random selection of
genes
Z=
observed −meanpermuted
sd permuted
66. • Increasing powers suppress low
correlations more strongly
• Often: β=6 often works well
Turning correlation into a network:Turning correlation into a network:
adjacency functionadjacency function
Aij=
{cor(xi , x j)
β
if cor(xi , x j)> 0
0 if cor(xi , x j)⩽0