SlideShare une entreprise Scribd logo
1  sur  66
Télécharger pour lire hors ligne
Peter Langfelder
Dept. of Human Genetics, UCLA
Weighted Gene Co-expression
Networks: Insights into HD
From a Million of Correlations
Weighted Gene Co-expression
Networks: Insights into HD
From a Million of Correlations
Billion
Peter Langfelder
Dept. of Human Genetics, UCLA
Our aims within the CHDI-UCLA JSCOur aims within the CHDI-UCLA JSC
●
Apply our systems-biological analysis methods to -omic data
from HD patients and animal models (both in-house and publicly
available data)
●
Identify gene networks involved in early HD pathogenesis, and
enable refining the gene networks by integrating further data
sets and functional studies
●
Cooperate with Yang lab in refining our gene networks based on
prior biological knowledge as well as on direct validation
●
Cooperate with Coppola group to store and disseminate our
results in a user-friendly manner
Talk roadmapTalk roadmap
●
Brief overview of Weighted Gene Co-expression Network
Analysis, WGCNA
●
WGCNA analysis of 6-month Allelic Series striatum
●
Preservation of striatum modules in other data sets
●
Consensus modules: modules present across multiple data sets
●
Consensus modules across publicly available human HD data
sets: insights into region-specific as well as common response
to HD pathology
●
Preservation of human HD-related modules in model organism
data
What is WGCNA?
• A compendium of methods to analyze “high-dimensional” data:
thousands of variables (gene expression, methylation,
proteomics, metabolomics, …) measured across multiple
samples (at least 20 but more is better)
• Aims: find disease-related biomarkers, identify candidate genes,
gain biological insights into what pathway may be associated
with the condition
What is different from other methods?
• “See the forest for the trees”
• Standard methods (differential gene expression etc) analyze each
gene in isolation (the trees)
• WGCNA: study how gene expressions behave together (the forest)...
• ...then use this information to make (more) sense of standard results
for individual genes
• Changes in networks across conditions can also provide important
biological insights
Constructing a weighted gene network
Bin Zhang and Steve Horvath (2005), A General Framework for Weighted Gene Co-
Expression Network Analysis, SAGMB 4 (1), Article 17
Peter Langfelder and Steve Horvath (2008), WGCNA: an R package for weighted
correlation network analysis. BMC Bioinformatics. 9:559
Constructing a weighted gene network
Constructing a weighted gene network
Constructing a weighted gene network
Constructing a weighted gene network
Turning correlation into a networkTurning correlation into a network
?
• A network can be represented by an adjacency matrix, A=[Aij], that
encodes whether/how a pair of nodes is connected.
Turning correlation into a networkTurning correlation into a network
?
=
Turning correlation into a networkTurning correlation into a network
?
Turning correlation into a networkTurning correlation into a network
?
• Suppress low correlations that may be due to noise
• Construct a “signed” network: negatively-correlated genes should be
unconnected
Constructing a weighted gene network
Constructing a weighted gene network
Network analysis of mRNA-seq data
from Allelic Series 6-month striatum
Modules in 6-month striatumModules in 6-month striatum
Blue (red): genes under- (over-)expressed in higher Q's
●
Find several modules (2, 20) that appear to group together genes that
change consistently with Q, and the change appears to increase with
Q
Relating modules to genotypes or otherRelating modules to genotypes or other
sample informationsample information
• Genes in each module are correlated (similar), hence can be
summarized by a single expression profile
• Use a synthetic expression profile called the eigengene, obtained
from Singular Value Decomposition (SVD)
• Relate each eigengene to genotype
using standard statistical methods (t-
test, regression significance, etc.)
• Calculating p-values is straightforward
Relating modules to statusRelating modules to status viavia eigengeneseigengenes
Red (blue): positive (negative) correlation of module eigengene with genotype
Format: (100*correlation)|(negative exponent of p-value)
●
M.2 (neuronal): down with increasing Q
●
M.20 (also neuronal): up with increasing Q (two separate populations of neurons?)
●
M.7 (oligodendrocyte), M.39 (mitochondria): up with increasing Q
●
M.11 (oligodendrocyte): up in moderate Q (80, 92), then down in high Q (140, 175)
●
Some modules with no strong enrichment also change strongly (M.10, M.41)
Eigengene "expression" as a function of QEigengene "expression" as a function of Q
for selected modulesfor selected modules
How to select most important genes?How to select most important genes?
• Network-based Target Prioritization Scheme (NTPS); eventually
should become part of Data To Target (D2T) online tool
• Combine multiple ways of selecting genes
– Individual association with genotype (consistent change
with increasing Q)
– Hub gene status in a biologically plausible module
(association with Q, enrichment, preservation in other data)
– Consistent association and hub gene status in related
tissues (e.g., cortex) and/or at other time points
– Additional statistical tests such as causal testing
– Prior knowledge (HD Target DB)
Network-based Target Prioritization Scheme (NTPS)Network-based Target Prioritization Scheme (NTPS)
• Current version combines gene-genotype and module information
from 6-month striatum and cortex
• The result is a ranked list of genes for further follow-up
• One can also sort genes based on purely striatum- or cortex-based
ranking; Top 10 overall ranked genes:
Gene Overall rank
Striatum
rank
Cortex rank
Striatum
module
Cortex
module
Arpp19 1 3 8 2 4
Arpp21 2 1 22 2 4
Ppp3ca 3 32 11 2 4
Rgs4 4 139 1 2 4
Chn1 5 9 38 2 4
Plcb1 6 144 4 2 4
Scn4b 7 20 50 2 2
Atp2b1 8 63 15 2 4
Slmap 9 30 68 2 38
Prkcb 10 18 77 2 4
But wait... there's more!
Network analysis can be applied to any high-dimensional
data set
But wait... there's more!
Network analysis can be applied to any high-dimensional
data set
Network analysis of miRNA-seq
from Allelic Series 6-month striatum
But wait... there's more!
Network analysis can be applied to any high-dimensional
data set
Network analysis of miRNA data from 6-month Striatum
●
miRNA expression patterns show strong modularity
●
Find modules 1,3,10 that appear to group together miRNAs strongly associated with Q
Association of miRNA modules with genotype
●
Modules 1,3,10 are strongly and consistently associated with Q; association gets
stronger with higher Q
Association of miRNA modules with genotype
●
Modules 1,3,10 show striking
patterns of Q-dependence;
other modules do not change
consistently
Integration of mRNA and miRNA
networks at the level of modules
Integrate miRNA and mRNA data: association of expression
●
miRNA modules 1,3,10 correlate strongly with several Q-associated mRNA modules
●
Could similar associations be observed in comparing gene modules to groups of targets
of each miRNA module?
Overlap of gene modules and predicted targets of miRNA modules
●
Predicted targets of miRNA modules 1 and 4 are enriched in neuronal module 2
●
MiRNA module 1 is positively correlated with gene module 2, so the enrichment
probably does not reflect miRNA-based regulation
Network analysis can help answer manyNetwork analysis can help answer many
research questionsresearch questions
• Find genes and modules related to a clinical trait, for example
disease status, severity etc; find candidate intervention targets,
biomarkers or (with additional data) causal genes
• Integrate different data types when measured for the same set of
samples
• Study commonalities and differences in the organization of gene
expression between conditions or species
– Study consensus modules: modules that are present in all
compared expression data sets
– Quantify preservation of modules between a reference and a test
condition
LimitationsLimitations
• Co-expression networks hypothesize that co-expression is
biologically meaningful and important for the studied trait
• For selecting biomarkers: standard screening methods are often
better; network analysis provides better insight into possible function
• Many factors influence expression and co-expression, not all of
which are biological/desirable, and it is often difficult to disentangle
the individual factors
LimitationsLimitations
• Co-expression networks do not carry functional information
• ...nor information about causality
• Modules tend to be rather large because statisticians like robust
results (much to biologists' displeasure)
• Functional annotation for modules tends to be rather general
• For best results, analysis requires a skilled human operator
LimitationsLimitations
≠
LimitationsLimitations
≠
≠
Preservation of modules identified in
Allelic Series striatum in other data
sets
Langfelder P, Luo R, Oldham MC, Horvath S,
Is My Network Module Preserved and Reproducible?
PloS Comp Biol 2011 7(1): e1001057

One often has a "reference" data set: a data set with an
interesting finding, for example, module 2 associates strongly
with Q in Allelic Series striatum

Hypothesis: expression (and co-expression) of genes in module 2
reflects in some way the pathological processes connected to HD
in a mouse model

If the genes in module 2 were co-expressed in human data as
well, it would provide indirect evidence that the same or similar
process is also active in human HD

Question: is module 2 "preserved" in human data, i.e., are genes
in module 2 co-expressed in human data as well?

General formulation: given a module identified in a "reference"
data set, is it preserved in an independent "test" data set?
Rationale for studying module preservation

Networks in general and WGCNA in particular provide several module
preservation statistics that measure various aspects of module
preservation

For convenience, we summarize the statistics in a single measure
called Zsummary. This works very much like a Z statistic:
– Zsummary < 2: no evidence of module preservation
– 2 < Zsummary < 7: weak to moderate evidence of preservation
– 7 < Zsummary: strong evidence of preservation
Quantifying module preservation:
The Zsummary statistic
How strongly are 6-month striatum modules preserved
in other data sets?
Zsummary < 2: no evidence of preservation
2 < Zsummary < 7: weak - moderate preservation
7 < Zsummary: strong preservation
• Most modules are preserved in CTX and other mouse brain data sets
• Oligodendrocyte M.7, M.11 appear preserved across most other data
• Neuronal M.2, M.20 preserved in some human data sets but in others
Integrating multiple expression data
sets: consensus module analysis
Langfelder P, Horvath S, Eigengene networks for studying the relationships between
co-expression modules. BMC Systems Biology 2007, 1:54
• Aim: find modules that are present in each of multiple independent
input data sets
,
Network 1 Network 2
Finding consensus modules

Modules group together densely interconnected genes

Consensus modules group together genes densely connected in all
input data sets

Our solution: find a consensus gene-gene similarity and use it with
clustering to find modules
=min( ),
Calibrated
Network 1
Calibrated
Network 2
Consensus
Finding consensus modules

Modules group together densely interconnected genes

Consensus modules group together genes densely connected in all
input data sets

Our solution: find the consensus gene-gene similarity and use it with
clustering to find modules
=min(
),
Calibrated
Network 1
Calibrated
Network 2
Consensus
Finding consensus modules
For multiple input data sets: replace minimum by a suitable quantile
Example consensus module analysis:
modules in human HD patients and
control expression data
7 data sets7 data sets
• Hodges et al (2006): CN, BA4 (motor cortex), CB
• Durrenberger et al (2011), Common neuroinflammatory pathways in
neurodegenerative diseases. GEO series GSE26927, data from CN of
10 controls and 9 patients
• Zhang et al (2012): several hundred HD and control samples from
prefrontal cortex, visual cortex and cerebellum
Are individual gene changes in response toAre individual gene changes in response to
HD concordant across the data sets?HD concordant across the data sets?
• There is relatively good concordance (especially given the difference
in how strongly HD affects CN vs. Ctx and CB)
• Data sets from the same tissue tend to cluster together
Consensus modules across human dataConsensus modules across human data
Relating consensus modules to HD statusRelating consensus modules to HD status
• Neuronal module M.10 down across all brain regions
• Astrocyte M.11, microglial M.12, HSP M.15 up across all brain regions
• Oligodendrocyte M.7 and microglial M.43 up CN but not in CTX or CB
• Ribosomal M.17 shows no appreciable change
• M.14 with chaperone protein genes is down in CN but up in other areas
Are human consensus modules preservedAre human consensus modules preserved
in mouse data?in mouse data?
• Most modules are preserved in BA4 postmortem human data
• Neuronal M.10, M.23 are preserved in most mouse data sets
• Oligodendrocyte M.7 is preserved in some sets: allelic series, Strand data
• Astrocyte M.11, microglial M.12 mostly not preserved
Lessons from the consensus module
analysis of human data sets

Response to HD shows interesting regional commonalities as well as
differences

Consensus module analysis leads to robust modules that are unlikely
to reflect technical artifacts

One can pick interesting genes by high intramodular connectivity in
status-related consensus modules
Summary

Network analysis organizes genes into a smaller number of modules
that can be assigned functional labels

Network-based Target Prioritization Scheme ranks genes based on
multiple criteria and provides a ranked list of genes for further follow-up

Advanced network-based method allow integration of independent data
and a multi-layer analysis (e.g., brain region differences in response to
HD, differences in response to varying Q length at different time points
etc.)
Acknowledgments
Jeff Aaronson, Jim Rosinki and entire CHDI
Steve Horvath
William Yang and his group, esp. Jeff Cantle and Xiaohong Lu
Jim Wang and Mike Palazzolo
Giovanni Coppola and his group, esp. Doxa Chatzopoulou and
Charles Blum
Interested?

More about Weighted Gene Co-expression Network Analysis
(WGCNA):
http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/

WGCNA implementation for R: package WGCNA
http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/Rpackages/WGCNA/

Module preservation project main publication and page with
examples and tutorials:
Peter Langfelder, Luo Rui, Michael C. Oldham, and Steve Horvath
Is My Network Module Preserved and Reproducible? PloS Comp Biol 7(1): e1001057
http://genetics.ucla.edu/labs/horvath/CoexpressionNetwork/ModulePreservation/

Consensus modules and consensus eigengene networks:
Peter Langfelder and Steve Horvath, Eigengene networks for studying the relationships
between co-expression modules. BMC Systems Biology 2007, 1:54
http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/EigengeneNetwork
Appendix
Step 5: Finding network modulesStep 5: Finding network modules
• Modules: groups (clusters) of strongly connected genes
• How to define “strong connection”? Simple answer “strong
connection = high adjacency” is susceptible to noise
• More complicated but also more robust answer: Topological overlap
(TOM): measures direct connection + shared neighbours
No shared neighbours: low TOM Many shared neighbours: high TOM
Eigengenes are useful in many contextsEigengenes are useful in many contexts
• Biological data reduction: instead of say 20,000 genes may have ~30
eigengenes that can be related to a clinical trait
– Alleviate the multiple testing problem
• Give rise to a continuous (“fuzzy”) measure of module membership
for all genes in all modules:
MMi=kMEi=cor xi , E
Judging modules by their Z scores

Each Z score provides answer to “Is the module significantly better
than a random sample of genes?”

Pool the Z scores into a summary (median) Z measure.

Z < 2 indicates no preservation, 2 < Z < 10 weak to moderate
evidence of preservation, Z > 10 strong evidence

Low Z means the “module” is no better than a random selection of
genes
Z=
observed −meanpermuted
sd permuted
• Increasing powers suppress low
correlations more strongly
• Often: β=6 often works well
Turning correlation into a network:Turning correlation into a network:
adjacency functionadjacency function
Aij=
{cor(xi , x j)
β
if cor(xi , x j)> 0
0 if cor(xi , x j)⩽0

Contenu connexe

Tendances

Mtt Assay for cell viability
Mtt Assay for cell viabilityMtt Assay for cell viability
Mtt Assay for cell viabilitysakeena gilani
 
Monoclonal Antibodies
Monoclonal AntibodiesMonoclonal Antibodies
Monoclonal AntibodiesAmeer Ahmed
 
Applications of microarray
Applications of microarrayApplications of microarray
Applications of microarrayprateek kumar
 
TOPO Cloning Lecture
TOPO Cloning LectureTOPO Cloning Lecture
TOPO Cloning Lectureouopened
 
DNA Analysis
DNA Analysis DNA Analysis
DNA Analysis Yosok Pun
 
Benefits of pharmacogenomics
Benefits of pharmacogenomicsBenefits of pharmacogenomics
Benefits of pharmacogenomicsSaajida Sultaana
 
05 Monoclonal Antibodies
05 Monoclonal Antibodies05 Monoclonal Antibodies
05 Monoclonal AntibodiesJaya Kumar
 
Sequenced taged sites (sts)
Sequenced taged sites (sts)Sequenced taged sites (sts)
Sequenced taged sites (sts)DHANRAJ GIRIMAL
 
Mutagenicity & Carcinogenecity.pptx
Mutagenicity & Carcinogenecity.pptxMutagenicity & Carcinogenecity.pptx
Mutagenicity & Carcinogenecity.pptxDr. Sarita Sharma
 
Antisense oligoneucleotides in therapy
Antisense  oligoneucleotides  in therapyAntisense  oligoneucleotides  in therapy
Antisense oligoneucleotides in therapyDr Sajeena Jose
 
Comparitive genome mapping and model systems
Comparitive genome mapping and model systemsComparitive genome mapping and model systems
Comparitive genome mapping and model systemsHimanshi Chauhan
 
Gene therapy in Cancer treatment
Gene therapy in Cancer treatmentGene therapy in Cancer treatment
Gene therapy in Cancer treatmentAnirban Bora
 
Types of pcr ppt by mala (1)
Types of pcr ppt by mala (1)Types of pcr ppt by mala (1)
Types of pcr ppt by mala (1)christanantony
 
Characterstics of transformed cells
Characterstics of transformed cellsCharacterstics of transformed cells
Characterstics of transformed cellsKAUSHAL SAHU
 
Chi square test final
Chi square test finalChi square test final
Chi square test finalHar Jindal
 
Gene therapy for Parkinson’s disease
Gene therapy for Parkinson’s diseaseGene therapy for Parkinson’s disease
Gene therapy for Parkinson’s diseaseUtkarsh Alok
 

Tendances (20)

Mtt Assay for cell viability
Mtt Assay for cell viabilityMtt Assay for cell viability
Mtt Assay for cell viability
 
Monoclonal Antibodies
Monoclonal AntibodiesMonoclonal Antibodies
Monoclonal Antibodies
 
Applications of microarray
Applications of microarrayApplications of microarray
Applications of microarray
 
Glioblastoma Presentation
Glioblastoma PresentationGlioblastoma Presentation
Glioblastoma Presentation
 
TOPO Cloning Lecture
TOPO Cloning LectureTOPO Cloning Lecture
TOPO Cloning Lecture
 
Sage technology
Sage technologySage technology
Sage technology
 
DNA Analysis
DNA Analysis DNA Analysis
DNA Analysis
 
Benefits of pharmacogenomics
Benefits of pharmacogenomicsBenefits of pharmacogenomics
Benefits of pharmacogenomics
 
05 Monoclonal Antibodies
05 Monoclonal Antibodies05 Monoclonal Antibodies
05 Monoclonal Antibodies
 
Sequenced taged sites (sts)
Sequenced taged sites (sts)Sequenced taged sites (sts)
Sequenced taged sites (sts)
 
Mutagenicity & Carcinogenecity.pptx
Mutagenicity & Carcinogenecity.pptxMutagenicity & Carcinogenecity.pptx
Mutagenicity & Carcinogenecity.pptx
 
Antisense oligoneucleotides in therapy
Antisense  oligoneucleotides  in therapyAntisense  oligoneucleotides  in therapy
Antisense oligoneucleotides in therapy
 
Comparitive genome mapping and model systems
Comparitive genome mapping and model systemsComparitive genome mapping and model systems
Comparitive genome mapping and model systems
 
Gene therapy in Cancer treatment
Gene therapy in Cancer treatmentGene therapy in Cancer treatment
Gene therapy in Cancer treatment
 
Epigenetics
EpigeneticsEpigenetics
Epigenetics
 
Gene mapping
Gene mappingGene mapping
Gene mapping
 
Types of pcr ppt by mala (1)
Types of pcr ppt by mala (1)Types of pcr ppt by mala (1)
Types of pcr ppt by mala (1)
 
Characterstics of transformed cells
Characterstics of transformed cellsCharacterstics of transformed cells
Characterstics of transformed cells
 
Chi square test final
Chi square test finalChi square test final
Chi square test final
 
Gene therapy for Parkinson’s disease
Gene therapy for Parkinson’s diseaseGene therapy for Parkinson’s disease
Gene therapy for Parkinson’s disease
 

Similaire à presentation

Proteomics - Analysis and integration of large-scale data sets
Proteomics - Analysis and integration of large-scale data setsProteomics - Analysis and integration of large-scale data sets
Proteomics - Analysis and integration of large-scale data setsLars Juhl Jensen
 
STRING - Modeling of pathways through cross-species integration of large-scal...
STRING - Modeling of pathways through cross-species integration of large-scal...STRING - Modeling of pathways through cross-species integration of large-scal...
STRING - Modeling of pathways through cross-species integration of large-scal...Lars Juhl Jensen
 
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...Enrico Glaab
 
Knowing Your NGS Downstream: Functional Predictions
Knowing Your NGS Downstream: Functional PredictionsKnowing Your NGS Downstream: Functional Predictions
Knowing Your NGS Downstream: Functional PredictionsGolden Helix Inc
 
Network Biology: A paradigm for modeling biological complex systems
Network Biology: A paradigm for modeling biological complex systemsNetwork Biology: A paradigm for modeling biological complex systems
Network Biology: A paradigm for modeling biological complex systemsGanesh Bagler
 
Java tutorial: Programmatic Access to Molecular Interactions
Java tutorial: Programmatic Access to Molecular InteractionsJava tutorial: Programmatic Access to Molecular Interactions
Java tutorial: Programmatic Access to Molecular InteractionsRafael C. Jimenez
 
Systems Biology Approaches to Cancer
Systems Biology Approaches to CancerSystems Biology Approaches to Cancer
Systems Biology Approaches to CancerRaunak Shrestha
 
A clonal based algorithm for the reconstruction of genetic network using s sy...
A clonal based algorithm for the reconstruction of genetic network using s sy...A clonal based algorithm for the reconstruction of genetic network using s sy...
A clonal based algorithm for the reconstruction of genetic network using s sy...eSAT Journals
 
A clonal based algorithm for the reconstruction of
A clonal based algorithm for the reconstruction ofA clonal based algorithm for the reconstruction of
A clonal based algorithm for the reconstruction ofeSAT Publishing House
 
STRING - Prediction of a functional association network for the yeast mitocho...
STRING - Prediction of a functional association network for the yeast mitocho...STRING - Prediction of a functional association network for the yeast mitocho...
STRING - Prediction of a functional association network for the yeast mitocho...Lars Juhl Jensen
 
overview on Next generation sequencing in breast csncer
overview on Next generation sequencing in breast csnceroverview on Next generation sequencing in breast csncer
overview on Next generation sequencing in breast csncerSeham Al-Shehri
 
Boosting probabilistic graphical model inference by incorporating prior knowl...
Boosting probabilistic graphical model inference by incorporating prior knowl...Boosting probabilistic graphical model inference by incorporating prior knowl...
Boosting probabilistic graphical model inference by incorporating prior knowl...Hakky St
 
Systems genetics approaches to understand complex traits
Systems genetics approaches to understand complex traitsSystems genetics approaches to understand complex traits
Systems genetics approaches to understand complex traitsSOYEON KIM
 
Friend harvard 2013-01-30
Friend harvard 2013-01-30Friend harvard 2013-01-30
Friend harvard 2013-01-30Sage Base
 
Prediction of protein function
Prediction of protein functionPrediction of protein function
Prediction of protein functionLars Juhl Jensen
 
Back to Basics: Using GWAS to Drive Discovery for Complex Diseases
Back to Basics: Using GWAS to Drive Discovery for Complex DiseasesBack to Basics: Using GWAS to Drive Discovery for Complex Diseases
Back to Basics: Using GWAS to Drive Discovery for Complex DiseasesGolden Helix Inc
 
NetBioSIG2012 anyatsalenko-en-viz
NetBioSIG2012 anyatsalenko-en-vizNetBioSIG2012 anyatsalenko-en-viz
NetBioSIG2012 anyatsalenko-en-vizAlexander Pico
 
Bioinformatics-R program의 실례
Bioinformatics-R program의 실례Bioinformatics-R program의 실례
Bioinformatics-R program의 실례mothersafe
 

Similaire à presentation (20)

Proteomics - Analysis and integration of large-scale data sets
Proteomics - Analysis and integration of large-scale data setsProteomics - Analysis and integration of large-scale data sets
Proteomics - Analysis and integration of large-scale data sets
 
STRING - Modeling of pathways through cross-species integration of large-scal...
STRING - Modeling of pathways through cross-species integration of large-scal...STRING - Modeling of pathways through cross-species integration of large-scal...
STRING - Modeling of pathways through cross-species integration of large-scal...
 
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
 
Knowing Your NGS Downstream: Functional Predictions
Knowing Your NGS Downstream: Functional PredictionsKnowing Your NGS Downstream: Functional Predictions
Knowing Your NGS Downstream: Functional Predictions
 
Network Biology: A paradigm for modeling biological complex systems
Network Biology: A paradigm for modeling biological complex systemsNetwork Biology: A paradigm for modeling biological complex systems
Network Biology: A paradigm for modeling biological complex systems
 
Java tutorial: Programmatic Access to Molecular Interactions
Java tutorial: Programmatic Access to Molecular InteractionsJava tutorial: Programmatic Access to Molecular Interactions
Java tutorial: Programmatic Access to Molecular Interactions
 
Systems Biology Approaches to Cancer
Systems Biology Approaches to CancerSystems Biology Approaches to Cancer
Systems Biology Approaches to Cancer
 
A clonal based algorithm for the reconstruction of genetic network using s sy...
A clonal based algorithm for the reconstruction of genetic network using s sy...A clonal based algorithm for the reconstruction of genetic network using s sy...
A clonal based algorithm for the reconstruction of genetic network using s sy...
 
A clonal based algorithm for the reconstruction of
A clonal based algorithm for the reconstruction ofA clonal based algorithm for the reconstruction of
A clonal based algorithm for the reconstruction of
 
STRING - Prediction of a functional association network for the yeast mitocho...
STRING - Prediction of a functional association network for the yeast mitocho...STRING - Prediction of a functional association network for the yeast mitocho...
STRING - Prediction of a functional association network for the yeast mitocho...
 
overview on Next generation sequencing in breast csncer
overview on Next generation sequencing in breast csnceroverview on Next generation sequencing in breast csncer
overview on Next generation sequencing in breast csncer
 
1207.2600
1207.26001207.2600
1207.2600
 
Qi liu 08.08.2014
Qi liu 08.08.2014Qi liu 08.08.2014
Qi liu 08.08.2014
 
Boosting probabilistic graphical model inference by incorporating prior knowl...
Boosting probabilistic graphical model inference by incorporating prior knowl...Boosting probabilistic graphical model inference by incorporating prior knowl...
Boosting probabilistic graphical model inference by incorporating prior knowl...
 
Systems genetics approaches to understand complex traits
Systems genetics approaches to understand complex traitsSystems genetics approaches to understand complex traits
Systems genetics approaches to understand complex traits
 
Friend harvard 2013-01-30
Friend harvard 2013-01-30Friend harvard 2013-01-30
Friend harvard 2013-01-30
 
Prediction of protein function
Prediction of protein functionPrediction of protein function
Prediction of protein function
 
Back to Basics: Using GWAS to Drive Discovery for Complex Diseases
Back to Basics: Using GWAS to Drive Discovery for Complex DiseasesBack to Basics: Using GWAS to Drive Discovery for Complex Diseases
Back to Basics: Using GWAS to Drive Discovery for Complex Diseases
 
NetBioSIG2012 anyatsalenko-en-viz
NetBioSIG2012 anyatsalenko-en-vizNetBioSIG2012 anyatsalenko-en-viz
NetBioSIG2012 anyatsalenko-en-viz
 
Bioinformatics-R program의 실례
Bioinformatics-R program의 실례Bioinformatics-R program의 실례
Bioinformatics-R program의 실례
 

presentation

  • 1. Peter Langfelder Dept. of Human Genetics, UCLA Weighted Gene Co-expression Networks: Insights into HD From a Million of Correlations
  • 2. Weighted Gene Co-expression Networks: Insights into HD From a Million of Correlations Billion Peter Langfelder Dept. of Human Genetics, UCLA
  • 3. Our aims within the CHDI-UCLA JSCOur aims within the CHDI-UCLA JSC ● Apply our systems-biological analysis methods to -omic data from HD patients and animal models (both in-house and publicly available data) ● Identify gene networks involved in early HD pathogenesis, and enable refining the gene networks by integrating further data sets and functional studies ● Cooperate with Yang lab in refining our gene networks based on prior biological knowledge as well as on direct validation ● Cooperate with Coppola group to store and disseminate our results in a user-friendly manner
  • 4. Talk roadmapTalk roadmap ● Brief overview of Weighted Gene Co-expression Network Analysis, WGCNA ● WGCNA analysis of 6-month Allelic Series striatum ● Preservation of striatum modules in other data sets ● Consensus modules: modules present across multiple data sets ● Consensus modules across publicly available human HD data sets: insights into region-specific as well as common response to HD pathology ● Preservation of human HD-related modules in model organism data
  • 5. What is WGCNA? • A compendium of methods to analyze “high-dimensional” data: thousands of variables (gene expression, methylation, proteomics, metabolomics, …) measured across multiple samples (at least 20 but more is better) • Aims: find disease-related biomarkers, identify candidate genes, gain biological insights into what pathway may be associated with the condition
  • 6. What is different from other methods? • “See the forest for the trees” • Standard methods (differential gene expression etc) analyze each gene in isolation (the trees) • WGCNA: study how gene expressions behave together (the forest)... • ...then use this information to make (more) sense of standard results for individual genes • Changes in networks across conditions can also provide important biological insights
  • 7. Constructing a weighted gene network Bin Zhang and Steve Horvath (2005), A General Framework for Weighted Gene Co- Expression Network Analysis, SAGMB 4 (1), Article 17 Peter Langfelder and Steve Horvath (2008), WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 9:559
  • 8. Constructing a weighted gene network
  • 9. Constructing a weighted gene network
  • 10. Constructing a weighted gene network
  • 11. Constructing a weighted gene network
  • 12. Turning correlation into a networkTurning correlation into a network ?
  • 13. • A network can be represented by an adjacency matrix, A=[Aij], that encodes whether/how a pair of nodes is connected. Turning correlation into a networkTurning correlation into a network ? =
  • 14. Turning correlation into a networkTurning correlation into a network ?
  • 15. Turning correlation into a networkTurning correlation into a network ? • Suppress low correlations that may be due to noise • Construct a “signed” network: negatively-correlated genes should be unconnected
  • 16. Constructing a weighted gene network
  • 17. Constructing a weighted gene network
  • 18. Network analysis of mRNA-seq data from Allelic Series 6-month striatum
  • 19. Modules in 6-month striatumModules in 6-month striatum Blue (red): genes under- (over-)expressed in higher Q's
  • 20. ● Find several modules (2, 20) that appear to group together genes that change consistently with Q, and the change appears to increase with Q
  • 21. Relating modules to genotypes or otherRelating modules to genotypes or other sample informationsample information • Genes in each module are correlated (similar), hence can be summarized by a single expression profile • Use a synthetic expression profile called the eigengene, obtained from Singular Value Decomposition (SVD) • Relate each eigengene to genotype using standard statistical methods (t- test, regression significance, etc.) • Calculating p-values is straightforward
  • 22. Relating modules to statusRelating modules to status viavia eigengeneseigengenes Red (blue): positive (negative) correlation of module eigengene with genotype Format: (100*correlation)|(negative exponent of p-value)
  • 23. ● M.2 (neuronal): down with increasing Q ● M.20 (also neuronal): up with increasing Q (two separate populations of neurons?) ● M.7 (oligodendrocyte), M.39 (mitochondria): up with increasing Q ● M.11 (oligodendrocyte): up in moderate Q (80, 92), then down in high Q (140, 175) ● Some modules with no strong enrichment also change strongly (M.10, M.41)
  • 24. Eigengene "expression" as a function of QEigengene "expression" as a function of Q for selected modulesfor selected modules
  • 25. How to select most important genes?How to select most important genes? • Network-based Target Prioritization Scheme (NTPS); eventually should become part of Data To Target (D2T) online tool • Combine multiple ways of selecting genes – Individual association with genotype (consistent change with increasing Q) – Hub gene status in a biologically plausible module (association with Q, enrichment, preservation in other data) – Consistent association and hub gene status in related tissues (e.g., cortex) and/or at other time points – Additional statistical tests such as causal testing – Prior knowledge (HD Target DB)
  • 26. Network-based Target Prioritization Scheme (NTPS)Network-based Target Prioritization Scheme (NTPS) • Current version combines gene-genotype and module information from 6-month striatum and cortex • The result is a ranked list of genes for further follow-up • One can also sort genes based on purely striatum- or cortex-based ranking; Top 10 overall ranked genes: Gene Overall rank Striatum rank Cortex rank Striatum module Cortex module Arpp19 1 3 8 2 4 Arpp21 2 1 22 2 4 Ppp3ca 3 32 11 2 4 Rgs4 4 139 1 2 4 Chn1 5 9 38 2 4 Plcb1 6 144 4 2 4 Scn4b 7 20 50 2 2 Atp2b1 8 63 15 2 4 Slmap 9 30 68 2 38 Prkcb 10 18 77 2 4
  • 27. But wait... there's more! Network analysis can be applied to any high-dimensional data set
  • 28. But wait... there's more! Network analysis can be applied to any high-dimensional data set
  • 29. Network analysis of miRNA-seq from Allelic Series 6-month striatum But wait... there's more! Network analysis can be applied to any high-dimensional data set
  • 30. Network analysis of miRNA data from 6-month Striatum ● miRNA expression patterns show strong modularity ● Find modules 1,3,10 that appear to group together miRNAs strongly associated with Q
  • 31. Association of miRNA modules with genotype ● Modules 1,3,10 are strongly and consistently associated with Q; association gets stronger with higher Q
  • 32. Association of miRNA modules with genotype ● Modules 1,3,10 show striking patterns of Q-dependence; other modules do not change consistently
  • 33. Integration of mRNA and miRNA networks at the level of modules
  • 34. Integrate miRNA and mRNA data: association of expression ● miRNA modules 1,3,10 correlate strongly with several Q-associated mRNA modules ● Could similar associations be observed in comparing gene modules to groups of targets of each miRNA module?
  • 35. Overlap of gene modules and predicted targets of miRNA modules ● Predicted targets of miRNA modules 1 and 4 are enriched in neuronal module 2 ● MiRNA module 1 is positively correlated with gene module 2, so the enrichment probably does not reflect miRNA-based regulation
  • 36. Network analysis can help answer manyNetwork analysis can help answer many research questionsresearch questions • Find genes and modules related to a clinical trait, for example disease status, severity etc; find candidate intervention targets, biomarkers or (with additional data) causal genes • Integrate different data types when measured for the same set of samples • Study commonalities and differences in the organization of gene expression between conditions or species – Study consensus modules: modules that are present in all compared expression data sets – Quantify preservation of modules between a reference and a test condition
  • 37. LimitationsLimitations • Co-expression networks hypothesize that co-expression is biologically meaningful and important for the studied trait • For selecting biomarkers: standard screening methods are often better; network analysis provides better insight into possible function • Many factors influence expression and co-expression, not all of which are biological/desirable, and it is often difficult to disentangle the individual factors
  • 38. LimitationsLimitations • Co-expression networks do not carry functional information • ...nor information about causality • Modules tend to be rather large because statisticians like robust results (much to biologists' displeasure) • Functional annotation for modules tends to be rather general • For best results, analysis requires a skilled human operator
  • 41. Preservation of modules identified in Allelic Series striatum in other data sets Langfelder P, Luo R, Oldham MC, Horvath S, Is My Network Module Preserved and Reproducible? PloS Comp Biol 2011 7(1): e1001057
  • 42.  One often has a "reference" data set: a data set with an interesting finding, for example, module 2 associates strongly with Q in Allelic Series striatum  Hypothesis: expression (and co-expression) of genes in module 2 reflects in some way the pathological processes connected to HD in a mouse model  If the genes in module 2 were co-expressed in human data as well, it would provide indirect evidence that the same or similar process is also active in human HD  Question: is module 2 "preserved" in human data, i.e., are genes in module 2 co-expressed in human data as well?  General formulation: given a module identified in a "reference" data set, is it preserved in an independent "test" data set? Rationale for studying module preservation
  • 43.  Networks in general and WGCNA in particular provide several module preservation statistics that measure various aspects of module preservation  For convenience, we summarize the statistics in a single measure called Zsummary. This works very much like a Z statistic: – Zsummary < 2: no evidence of module preservation – 2 < Zsummary < 7: weak to moderate evidence of preservation – 7 < Zsummary: strong evidence of preservation Quantifying module preservation: The Zsummary statistic
  • 44. How strongly are 6-month striatum modules preserved in other data sets? Zsummary < 2: no evidence of preservation 2 < Zsummary < 7: weak - moderate preservation 7 < Zsummary: strong preservation
  • 45. • Most modules are preserved in CTX and other mouse brain data sets • Oligodendrocyte M.7, M.11 appear preserved across most other data • Neuronal M.2, M.20 preserved in some human data sets but in others
  • 46. Integrating multiple expression data sets: consensus module analysis Langfelder P, Horvath S, Eigengene networks for studying the relationships between co-expression modules. BMC Systems Biology 2007, 1:54 • Aim: find modules that are present in each of multiple independent input data sets
  • 47. , Network 1 Network 2 Finding consensus modules  Modules group together densely interconnected genes  Consensus modules group together genes densely connected in all input data sets  Our solution: find a consensus gene-gene similarity and use it with clustering to find modules
  • 48. =min( ), Calibrated Network 1 Calibrated Network 2 Consensus Finding consensus modules  Modules group together densely interconnected genes  Consensus modules group together genes densely connected in all input data sets  Our solution: find the consensus gene-gene similarity and use it with clustering to find modules
  • 49. =min( ), Calibrated Network 1 Calibrated Network 2 Consensus Finding consensus modules For multiple input data sets: replace minimum by a suitable quantile
  • 50. Example consensus module analysis: modules in human HD patients and control expression data
  • 51. 7 data sets7 data sets • Hodges et al (2006): CN, BA4 (motor cortex), CB • Durrenberger et al (2011), Common neuroinflammatory pathways in neurodegenerative diseases. GEO series GSE26927, data from CN of 10 controls and 9 patients • Zhang et al (2012): several hundred HD and control samples from prefrontal cortex, visual cortex and cerebellum
  • 52. Are individual gene changes in response toAre individual gene changes in response to HD concordant across the data sets?HD concordant across the data sets? • There is relatively good concordance (especially given the difference in how strongly HD affects CN vs. Ctx and CB) • Data sets from the same tissue tend to cluster together
  • 53. Consensus modules across human dataConsensus modules across human data
  • 54. Relating consensus modules to HD statusRelating consensus modules to HD status
  • 55. • Neuronal module M.10 down across all brain regions • Astrocyte M.11, microglial M.12, HSP M.15 up across all brain regions • Oligodendrocyte M.7 and microglial M.43 up CN but not in CTX or CB • Ribosomal M.17 shows no appreciable change • M.14 with chaperone protein genes is down in CN but up in other areas
  • 56. Are human consensus modules preservedAre human consensus modules preserved in mouse data?in mouse data?
  • 57. • Most modules are preserved in BA4 postmortem human data • Neuronal M.10, M.23 are preserved in most mouse data sets • Oligodendrocyte M.7 is preserved in some sets: allelic series, Strand data • Astrocyte M.11, microglial M.12 mostly not preserved
  • 58. Lessons from the consensus module analysis of human data sets  Response to HD shows interesting regional commonalities as well as differences  Consensus module analysis leads to robust modules that are unlikely to reflect technical artifacts  One can pick interesting genes by high intramodular connectivity in status-related consensus modules
  • 59. Summary  Network analysis organizes genes into a smaller number of modules that can be assigned functional labels  Network-based Target Prioritization Scheme ranks genes based on multiple criteria and provides a ranked list of genes for further follow-up  Advanced network-based method allow integration of independent data and a multi-layer analysis (e.g., brain region differences in response to HD, differences in response to varying Q length at different time points etc.)
  • 60. Acknowledgments Jeff Aaronson, Jim Rosinki and entire CHDI Steve Horvath William Yang and his group, esp. Jeff Cantle and Xiaohong Lu Jim Wang and Mike Palazzolo Giovanni Coppola and his group, esp. Doxa Chatzopoulou and Charles Blum
  • 61. Interested?  More about Weighted Gene Co-expression Network Analysis (WGCNA): http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/  WGCNA implementation for R: package WGCNA http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/Rpackages/WGCNA/  Module preservation project main publication and page with examples and tutorials: Peter Langfelder, Luo Rui, Michael C. Oldham, and Steve Horvath Is My Network Module Preserved and Reproducible? PloS Comp Biol 7(1): e1001057 http://genetics.ucla.edu/labs/horvath/CoexpressionNetwork/ModulePreservation/  Consensus modules and consensus eigengene networks: Peter Langfelder and Steve Horvath, Eigengene networks for studying the relationships between co-expression modules. BMC Systems Biology 2007, 1:54 http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/EigengeneNetwork
  • 63. Step 5: Finding network modulesStep 5: Finding network modules • Modules: groups (clusters) of strongly connected genes • How to define “strong connection”? Simple answer “strong connection = high adjacency” is susceptible to noise • More complicated but also more robust answer: Topological overlap (TOM): measures direct connection + shared neighbours No shared neighbours: low TOM Many shared neighbours: high TOM
  • 64. Eigengenes are useful in many contextsEigengenes are useful in many contexts • Biological data reduction: instead of say 20,000 genes may have ~30 eigengenes that can be related to a clinical trait – Alleviate the multiple testing problem • Give rise to a continuous (“fuzzy”) measure of module membership for all genes in all modules: MMi=kMEi=cor xi , E
  • 65. Judging modules by their Z scores  Each Z score provides answer to “Is the module significantly better than a random sample of genes?”  Pool the Z scores into a summary (median) Z measure.  Z < 2 indicates no preservation, 2 < Z < 10 weak to moderate evidence of preservation, Z > 10 strong evidence  Low Z means the “module” is no better than a random selection of genes Z= observed −meanpermuted sd permuted
  • 66. • Increasing powers suppress low correlations more strongly • Often: β=6 often works well Turning correlation into a network:Turning correlation into a network: adjacency functionadjacency function Aij= {cor(xi , x j) β if cor(xi , x j)> 0 0 if cor(xi , x j)⩽0