presentation

Peter Langfelder
Dept. of Human Genetics, UCLA
Weighted Gene Co-expression
Networks: Insights into HD
From a Million of Correlations

Weighted Gene Co-expression
Networks: Insights into HD
From a Million of Correlations
Billion
Peter Langfelder
Dept. of Human Genetics, UCLA

Our aims within the CHDI-UCLA JSCOur aims within the CHDI-UCLA JSC
●
Apply our systems-biological analysis methods to -omic data
from HD patients and animal models (both in-house and publicly
available data)
●
Identify gene networks involved in early HD pathogenesis, and
enable refining the gene networks by integrating further data
sets and functional studies
●
Cooperate with Yang lab in refining our gene networks based on
prior biological knowledge as well as on direct validation
●
Cooperate with Coppola group to store and disseminate our
results in a user-friendly manner

Talk roadmapTalk roadmap
●
Brief overview of Weighted Gene Co-expression Network
Analysis, WGCNA
●
WGCNA analysis of 6-month Allelic Series striatum
●
Preservation of striatum modules in other data sets
●
Consensus modules: modules present across multiple data sets
●
Consensus modules across publicly available human HD data
sets: insights into region-specific as well as common response
to HD pathology
●
Preservation of human HD-related modules in model organism
data

What is WGCNA?
• A compendium of methods to analyze “high-dimensional” data:
thousands of variables (gene expression, methylation,
proteomics, metabolomics, …) measured across multiple
samples (at least 20 but more is better)
• Aims: find disease-related biomarkers, identify candidate genes,
gain biological insights into what pathway may be associated
with the condition

What is different from other methods?
• “See the forest for the trees”
• Standard methods (differential gene expression etc) analyze each
gene in isolation (the trees)
• WGCNA: study how gene expressions behave together (the forest)...
• ...then use this information to make (more) sense of standard results
for individual genes
• Changes in networks across conditions can also provide important
biological insights

Constructing a weighted gene network
Bin Zhang and Steve Horvath (2005), A General Framework for Weighted Gene Co-
Expression Network Analysis, SAGMB 4 (1), Article 17
Peter Langfelder and Steve Horvath (2008), WGCNA: an R package for weighted
correlation network analysis. BMC Bioinformatics. 9:559

Constructing a weighted gene network

Turning correlation into a networkTurning correlation into a network
?

• A network can be represented by an adjacency matrix, A=[Aij], that
encodes whether/how a pair of nodes is connected.
?
=

?
• Suppress low correlations that may be due to noise
• Construct a “signed” network: negatively-correlated genes should be
unconnected

Network analysis of mRNA-seq data
from Allelic Series 6-month striatum

Modules in 6-month striatumModules in 6-month striatum
Blue (red): genes under- (over-)expressed in higher Q's

●
Find several modules (2, 20) that appear to group together genes that
change consistently with Q, and the change appears to increase with
Q

Relating modules to genotypes or otherRelating modules to genotypes or other
sample informationsample information
• Genes in each module are correlated (similar), hence can be
summarized by a single expression profile
• Use a synthetic expression profile called the eigengene, obtained
from Singular Value Decomposition (SVD)
• Relate each eigengene to genotype
using standard statistical methods (t-
test, regression significance, etc.)
• Calculating p-values is straightforward

Relating modules to statusRelating modules to status viavia eigengeneseigengenes
Red (blue): positive (negative) correlation of module eigengene with genotype
Format: (100*correlation)|(negative exponent of p-value)

●
M.2 (neuronal): down with increasing Q
●
M.20 (also neuronal): up with increasing Q (two separate populations of neurons?)
●
M.7 (oligodendrocyte), M.39 (mitochondria): up with increasing Q
●
M.11 (oligodendrocyte): up in moderate Q (80, 92), then down in high Q (140, 175)
●
Some modules with no strong enrichment also change strongly (M.10, M.41)

Eigengene "expression" as a function of QEigengene "expression" as a function of Q
for selected modulesfor selected modules

How to select most important genes?How to select most important genes?
• Network-based Target Prioritization Scheme (NTPS); eventually
should become part of Data To Target (D2T) online tool
• Combine multiple ways of selecting genes
– Individual association with genotype (consistent change
with increasing Q)
– Hub gene status in a biologically plausible module
(association with Q, enrichment, preservation in other data)
– Consistent association and hub gene status in related
tissues (e.g., cortex) and/or at other time points
– Additional statistical tests such as causal testing
– Prior knowledge (HD Target DB)

Network-based Target Prioritization Scheme (NTPS)Network-based Target Prioritization Scheme (NTPS)
• Current version combines gene-genotype and module information
from 6-month striatum and cortex
• The result is a ranked list of genes for further follow-up
• One can also sort genes based on purely striatum- or cortex-based
ranking; Top 10 overall ranked genes:
Gene Overall rank
Striatum
rank
Cortex rank
Striatum
module
Cortex
module
Arpp19 1 3 8 2 4
Arpp21 2 1 22 2 4
Ppp3ca 3 32 11 2 4
Rgs4 4 139 1 2 4
Chn1 5 9 38 2 4
Plcb1 6 144 4 2 4
Scn4b 7 20 50 2 2
Atp2b1 8 63 15 2 4
Slmap 9 30 68 2 38
Prkcb 10 18 77 2 4

But wait... there's more!
Network analysis can be applied to any high-dimensional
data set

Network analysis of miRNA-seq
from Allelic Series 6-month striatum
But wait... there's more!
Network analysis can be applied to any high-dimensional
data set

Network analysis of miRNA data from 6-month Striatum
●
miRNA expression patterns show strong modularity
●
Find modules 1,3,10 that appear to group together miRNAs strongly associated with Q

Association of miRNA modules with genotype
●
Modules 1,3,10 are strongly and consistently associated with Q; association gets
stronger with higher Q

Association of miRNA modules with genotype
●
Modules 1,3,10 show striking
patterns of Q-dependence;
other modules do not change
consistently

Integration of mRNA and miRNA
networks at the level of modules

Integrate miRNA and mRNA data: association of expression
●
miRNA modules 1,3,10 correlate strongly with several Q-associated mRNA modules
●
Could similar associations be observed in comparing gene modules to groups of targets
of each miRNA module?

Overlap of gene modules and predicted targets of miRNA modules
●
Predicted targets of miRNA modules 1 and 4 are enriched in neuronal module 2
●
MiRNA module 1 is positively correlated with gene module 2, so the enrichment
probably does not reflect miRNA-based regulation

Network analysis can help answer manyNetwork analysis can help answer many
research questionsresearch questions
• Find genes and modules related to a clinical trait, for example
disease status, severity etc; find candidate intervention targets,
biomarkers or (with additional data) causal genes
• Integrate different data types when measured for the same set of
samples
• Study commonalities and differences in the organization of gene
expression between conditions or species
– Study consensus modules: modules that are present in all
compared expression data sets
– Quantify preservation of modules between a reference and a test
condition

LimitationsLimitations
• Co-expression networks hypothesize that co-expression is
biologically meaningful and important for the studied trait
• For selecting biomarkers: standard screening methods are often
better; network analysis provides better insight into possible function
• Many factors influence expression and co-expression, not all of
which are biological/desirable, and it is often difficult to disentangle
the individual factors

• Co-expression networks do not carry functional information
• ...nor information about causality
• Modules tend to be rather large because statisticians like robust
results (much to biologists' displeasure)
• Functional annotation for modules tends to be rather general
• For best results, analysis requires a skilled human operator

≠
≠

Preservation of modules identified in
Allelic Series striatum in other data
sets
Langfelder P, Luo R, Oldham MC, Horvath S,
Is My Network Module Preserved and Reproducible?
PloS Comp Biol 2011 7(1): e1001057


One often has a "reference" data set: a data set with an
interesting finding, for example, module 2 associates strongly
with Q in Allelic Series striatum

Hypothesis: expression (and co-expression) of genes in module 2
reflects in some way the pathological processes connected to HD
in a mouse model

If the genes in module 2 were co-expressed in human data as
well, it would provide indirect evidence that the same or similar
process is also active in human HD

Question: is module 2 "preserved" in human data, i.e., are genes
in module 2 co-expressed in human data as well?

General formulation: given a module identified in a "reference"
data set, is it preserved in an independent "test" data set?
Rationale for studying module preservation


Networks in general and WGCNA in particular provide several module
preservation statistics that measure various aspects of module
preservation

For convenience, we summarize the statistics in a single measure
called Zsummary. This works very much like a Z statistic:
– Zsummary < 2: no evidence of module preservation
– 2 < Zsummary < 7: weak to moderate evidence of preservation
– 7 < Zsummary: strong evidence of preservation
Quantifying module preservation:
The Zsummary statistic

How strongly are 6-month striatum modules preserved
in other data sets?
Zsummary < 2: no evidence of preservation
2 < Zsummary < 7: weak - moderate preservation
7 < Zsummary: strong preservation

• Most modules are preserved in CTX and other mouse brain data sets
• Oligodendrocyte M.7, M.11 appear preserved across most other data
• Neuronal M.2, M.20 preserved in some human data sets but in others

Integrating multiple expression data
sets: consensus module analysis
Langfelder P, Horvath S, Eigengene networks for studying the relationships between
co-expression modules. BMC Systems Biology 2007, 1:54
• Aim: find modules that are present in each of multiple independent
input data sets

,
Network 1 Network 2
Finding consensus modules

Modules group together densely interconnected genes

Consensus modules group together genes densely connected in all
input data sets

Our solution: find a consensus gene-gene similarity and use it with
clustering to find modules

=min( ),
Calibrated
Network 1
Calibrated
Network 2
Consensus

Modules group together densely interconnected genes

Consensus modules group together genes densely connected in all
input data sets

Our solution: find the consensus gene-gene similarity and use it with
clustering to find modules

=min(
),
Calibrated
Network 1
Calibrated
Network 2
Consensus
For multiple input data sets: replace minimum by a suitable quantile

Example consensus module analysis:
modules in human HD patients and
control expression data

7 data sets7 data sets
• Hodges et al (2006): CN, BA4 (motor cortex), CB
• Durrenberger et al (2011), Common neuroinflammatory pathways in
neurodegenerative diseases. GEO series GSE26927, data from CN of
10 controls and 9 patients
• Zhang et al (2012): several hundred HD and control samples from
prefrontal cortex, visual cortex and cerebellum

Are individual gene changes in response toAre individual gene changes in response to
HD concordant across the data sets?HD concordant across the data sets?
• There is relatively good concordance (especially given the difference
in how strongly HD affects CN vs. Ctx and CB)
• Data sets from the same tissue tend to cluster together

Consensus modules across human dataConsensus modules across human data

Relating consensus modules to HD statusRelating consensus modules to HD status

• Neuronal module M.10 down across all brain regions
• Astrocyte M.11, microglial M.12, HSP M.15 up across all brain regions
• Oligodendrocyte M.7 and microglial M.43 up CN but not in CTX or CB
• Ribosomal M.17 shows no appreciable change
• M.14 with chaperone protein genes is down in CN but up in other areas

Are human consensus modules preservedAre human consensus modules preserved
in mouse data?in mouse data?

• Most modules are preserved in BA4 postmortem human data
• Neuronal M.10, M.23 are preserved in most mouse data sets
• Oligodendrocyte M.7 is preserved in some sets: allelic series, Strand data
• Astrocyte M.11, microglial M.12 mostly not preserved

Lessons from the consensus module
analysis of human data sets

Response to HD shows interesting regional commonalities as well as
differences

Consensus module analysis leads to robust modules that are unlikely
to reflect technical artifacts

One can pick interesting genes by high intramodular connectivity in
status-related consensus modules

Summary

Network analysis organizes genes into a smaller number of modules
that can be assigned functional labels

Network-based Target Prioritization Scheme ranks genes based on
multiple criteria and provides a ranked list of genes for further follow-up

Advanced network-based method allow integration of independent data
and a multi-layer analysis (e.g., brain region differences in response to
HD, differences in response to varying Q length at different time points
etc.)

Acknowledgments
Jeff Aaronson, Jim Rosinki and entire CHDI
Steve Horvath
William Yang and his group, esp. Jeff Cantle and Xiaohong Lu
Jim Wang and Mike Palazzolo
Giovanni Coppola and his group, esp. Doxa Chatzopoulou and
Charles Blum

Interested?

More about Weighted Gene Co-expression Network Analysis
(WGCNA):
http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/

WGCNA implementation for R: package WGCNA
http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/Rpackages/WGCNA/

Module preservation project main publication and page with
examples and tutorials:
Peter Langfelder, Luo Rui, Michael C. Oldham, and Steve Horvath
Is My Network Module Preserved and Reproducible? PloS Comp Biol 7(1): e1001057
http://genetics.ucla.edu/labs/horvath/CoexpressionNetwork/ModulePreservation/

Consensus modules and consensus eigengene networks:
Peter Langfelder and Steve Horvath, Eigengene networks for studying the relationships
between co-expression modules. BMC Systems Biology 2007, 1:54
http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/EigengeneNetwork

Step 5: Finding network modulesStep 5: Finding network modules
• Modules: groups (clusters) of strongly connected genes
• How to define “strong connection”? Simple answer “strong
connection = high adjacency” is susceptible to noise
• More complicated but also more robust answer: Topological overlap
(TOM): measures direct connection + shared neighbours
No shared neighbours: low TOM Many shared neighbours: high TOM

Eigengenes are useful in many contextsEigengenes are useful in many contexts
• Biological data reduction: instead of say 20,000 genes may have ~30
eigengenes that can be related to a clinical trait
– Alleviate the multiple testing problem
• Give rise to a continuous (“fuzzy”) measure of module membership
for all genes in all modules:
MMi=kMEi=cor xi , E

Judging modules by their Z scores

Each Z score provides answer to “Is the module significantly better
than a random sample of genes?”

Pool the Z scores into a summary (median) Z measure.

Z < 2 indicates no preservation, 2 < Z < 10 weak to moderate
evidence of preservation, Z > 10 strong evidence

Low Z means the “module” is no better than a random selection of
genes
Z=
observed −meanpermuted
sd permuted

• Increasing powers suppress low
correlations more strongly
• Often: β=6 often works well
Turning correlation into a network:Turning correlation into a network:
adjacency functionadjacency function
Aij=
{cor(xi , x j)
β
if cor(xi , x j)> 0
0 if cor(xi , x j)⩽0

presentation

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à presentation

Similaire à presentation (20)

presentation