SlideShare a Scribd company logo
1 of 32
Dmitry Grapov, PhD
Multivariate Analysis and
Visualization of ProteOmic Data
State of the art facility producing massive
amounts of biological data…
>20-30K samples/yr
>200 studies
Analysis at the ProteOmic Scale and Beyond
Genomic
Proteomic
Metabolomic
Multi-OmicOmic
integration
Sample
Variable
Data Analysis and Visualization
Quality Assessment
• use replicated mesurements
and/or internal standards to
estimate analytical variance
Statistical and Multivariate
• use the experimental design
to test hypotheses and/or
identify trends in analytes
Functional
• use statistical and multivariate
results to identify impacted
biochemical domains
Network
• integrate statistical and
multivariate results with the
experimental design and
analyte metadata
experimental design
- organism, sex, age etc.
analyte description and
metadata
- biochemical class, mass
spectra, etc.
VariableSample
Sample
Variable
Data Analysis and Visualization
Quality Assessment
• use replicated mesurements
and/or internal standards to
estimate analytical variance
Statistical and Multivariate
• use the experimental design
to test hypotheses and/or
identify trends in analytes
Functional
• use statistical and multivariate
results to identify impacted
biochemical domains
Network
• integrate statistical and
multivariate results with the
experimental design and
analyte metadata
Network Mapping
experimental design
- organism, sex, age etc.
analyte description and
metadata
- biochemical class, mass
spectra, etc.
VariableSample
Data Quality Assessment
Quality metrics
•Precision (replicated
measurements)
•Accuracy (reference
samples)
Common tasks
•normalization
•outlier detection
•missing values
imputation
Principal Component
Analysis (PCA) of all
analytes, showing QC
sample scores
Batch Effects
Drift in >400 replicated measurements across >100 analytical batches for a single analyte
Acquisition batch
Abundance
QCs embedded
among >5,5000
samples (1:10)
collected over
1.5 yrs
If the biological effect
size is less than the
analytical variance
then the experiment
will incorrectly yield
insignificant results
Analyte specific data quality
overview
Sample specific normalization can be used
to estimate and remove analytical variance
Raw Data Normalized Data
Normalizations need to be
numerically and visually validated
log mean
low precision
%RSD
high precision
Samples
QCs
Batch Effects
Outlier Detection
• 1 variable
(univariate)
• 2 variables
(bivariate)
• >2 variables
(multivariate)
bivariate vs.
multivariate
mixed up samples
outliers?
(scatter plot)
(PCA scores plot)
Outlier Detection
Network Mapping
Ranked statistically
significant differences
within a a biochemical
context
Statistics
Multivariate
Context
+
+
=
Statistical and Multivariate Analyses
Group 1
Group 2
What analytes are
different between the
two groups of samples?
Statistical
significant differences
lacking rank and
context
t-Test
Multivariate
ranked differences
lacking significance
and context
O-PLS-DA
Network Mapping
Statistics
Multivariate
Context
+
+
=
Statistical and Multivariate Analyses
Group 1
Group 2
What analytes are
different between the
two groups of samples?
Statistical
t-Test
Multivariate
O-PLS-DA
To see the big picture it is necessary too view the data from multiple
different angles
Statistical Analysis: achieving ‘significance’
significance level (α) and power (1-β )
effect size (standardized difference in
means)
sample size (n)
Power analyses can be used to
optimize future experiments
given preliminary data
Example: use experimentally
derived (or literature estimated)
effect sizes, desired p-value
(alpha) and power (beta) to
calculate the optimal number of
samples per group
Statistical Tests
• Should be chosen based on the distribution
(shape, type) of the (e.g. normal, negative
binomial, Poisson)
• Can be optimized based on data pre-
treatment (e.g. NSAF, Power Law Global Error
Model, PLGEM)
Poisson normal
False Discovery Rate (FDR)
Type I Error: False Positives (α)
•Type II Error: False Negatives (β)
•Type I risk =
•1-(1-p.value)m
m = number of variables tested
False Discovery Rate Adjustment
FDRadjustedp-value
p-value
Benjamini &
Hochberg (1995)
(“BH”)
•Accepted standard
Bonferroni
•Very conservative
•adjusted p-value =
p-value x # of tests
(e.g. 0.005 x 148 = 0.74 )
Functional Analysis
Nucl. Acids Res. (2008) 36 (suppl 2): W423-W426.doi: 10.1093/nar/gkn282
Identify changes or enrichment in biochemical domains
• decrease
• increase
Functional Analysis: Enrichment
Biochemical Pathway Biochemical Ontology
Common Multivariate Methods
Clustering
Projection
Networks
Artist: Chuck Close
Cluster Analysis
Useful for
•pattern recognition
•complexity reduction
Common Methods
•Hierarchical
•Model based
•Other (k-means, k-NN, PAM,
fuzzy)
Linkage k-means
Distribution Density
Hierarchical Clustering
Similarity
x
x
x
x
Dendrogram
How does my metadata
match my data structure?
Projection Methods
The algorithm defines the position of the light source
Principal Components Analysis (PCA)
• unsupervised
• maximize variance (X)
Partial Least Squares Projection to
Latent Structures (PLS)
• supervised
• maximize covariance (Y ~ X)
James X. Li, 2009, VisuMap Tech.
single analyte all analytes
Interpreting scores and loadings
variables with the highest loadings have the
greatest contribution to sample scores
loadings represent how variables
contribute to sample scores
loadings
Scores represent
dis/similarities in samples
based on all variables
scores
Networks
Biochemical
•interaction
• enrichment
•etc
Empirical (dependency)
•correlation
•partial-correlation
•clustering
variable 2
variable 1
variable 3
Enrichment Network
Mapping of parents through children
Interaction Networks
Empirical Networks
• Correlation based networks (CN)
(simple, tendency to hairball)
• GGM or partial correlation based
networks (advanced, preference
of direct over indirect
relationships
• *Increase in robustness with
sample size
10.1007/978-1-4614-1689-0_17
Proteomic Case Study: Diabetes Markers
• Small sample size (control =12, GDM =6); covariates (time of sample collection)
• >600 measured colostrum proteins; ~ 300 NSAF normalized proteins retained
• Multivariate classification with O-PLS-DA used to identify variables to test using
PLGEM with correction for FDR
• Partial-correlation protein-protein interaction network analysis
DeviumWeb
https://github.com/dgrapov/DeviumWeb
• visualization
• statistics
• clustering
• PCA
• O-PLS
DeviumWeb
• visualization
• statistics
• clustering
• PCA
• O-PLS
https://github.com/dgrapov/DeviumWeb
Software and Resources
•DeviumWeb- Dynamic multivariate data analysis and
visualization platform
url: https://github.com/dgrapov/DeviumWeb
•imDEV- Microsoft Excel add-in for multivariate analysis
url: http://sourceforge.net/projects/imdev/
•MetaMapR- Network analysis tools for metabolomics
url: https://github.com/dgrapov/MetaMapR
•TeachingDemos- Tutorials and demonstrations
•url: http://sourceforge.net/projects/teachingdemos/?source=directory
•url: https://github.com/dgrapov/TeachingDemos
•Data analysis case studies and Examples
url: http://imdevsoftware.wordpress.com/
Questions?
dgrapov@ucdavis.edu
This research was supported in part by NIH 1 U24 DK097154

More Related Content

What's hot

3 principal components analysis
3  principal components analysis3  principal components analysis
3 principal components analysis
Dmitry Grapov
 
Multivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological dataMultivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological data
Dmitry Grapov
 
1 statistical analysis
1  statistical analysis1  statistical analysis
1 statistical analysis
Dmitry Grapov
 
4 partial least squares modeling
4  partial least squares modeling4  partial least squares modeling
4 partial least squares modeling
Dmitry Grapov
 

What's hot (20)

0 introduction
0  introduction0  introduction
0 introduction
 
Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014
 
Automation of (Biological) Data Analysis and Report Generation
Automation of (Biological) Data Analysis and Report GenerationAutomation of (Biological) Data Analysis and Report Generation
Automation of (Biological) Data Analysis and Report Generation
 
Metabolomic Data Analysis Case Studies
Metabolomic Data Analysis Case StudiesMetabolomic Data Analysis Case Studies
Metabolomic Data Analysis Case Studies
 
Data analysis workflows part 2 2015
Data analysis workflows part 2 2015Data analysis workflows part 2 2015
Data analysis workflows part 2 2015
 
Metabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsMetabolomic data analysis and visualization tools
Metabolomic data analysis and visualization tools
 
3 principal components analysis
3  principal components analysis3  principal components analysis
3 principal components analysis
 
Multivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological dataMultivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological data
 
Case Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization StrategiesCase Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization Strategies
 
3 data normalization (2014 lab tutorial)
3  data normalization (2014 lab tutorial)3  data normalization (2014 lab tutorial)
3 data normalization (2014 lab tutorial)
 
1 statistical analysis
1  statistical analysis1  statistical analysis
1 statistical analysis
 
4 partial least squares modeling
4  partial least squares modeling4  partial least squares modeling
4 partial least squares modeling
 
7 network mapping i
7  network mapping i7  network mapping i
7 network mapping i
 
Complex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine LearningComplex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine Learning
 
Omic Data Integration Strategies
Omic Data Integration StrategiesOmic Data Integration Strategies
Omic Data Integration Strategies
 
Machine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network AnalysisMachine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network Analysis
 
Data analysis workflows part 1 2015
Data analysis workflows part 1 2015Data analysis workflows part 1 2015
Data analysis workflows part 1 2015
 
Some statistical concepts relevant to proteomics data analysis
Some statistical concepts relevant to proteomics data analysisSome statistical concepts relevant to proteomics data analysis
Some statistical concepts relevant to proteomics data analysis
 
Harnessing The Proteome With Proteo Iq Quantitative Proteomics Software
Harnessing The Proteome With Proteo Iq Quantitative Proteomics SoftwareHarnessing The Proteome With Proteo Iq Quantitative Proteomics Software
Harnessing The Proteome With Proteo Iq Quantitative Proteomics Software
 
Connecting Metabolomic Data with Context
Connecting Metabolomic Data with ContextConnecting Metabolomic Data with Context
Connecting Metabolomic Data with Context
 

Viewers also liked

6 metabolite enrichment analysis
6  metabolite enrichment analysis6  metabolite enrichment analysis
6 metabolite enrichment analysis
Dmitry Grapov
 
5 data analysis case study
5  data analysis case study5  data analysis case study
5 data analysis case study
Dmitry Grapov
 
Statistical Analysis of Left-Censored Geochemical Data
Statistical Analysis of Left-Censored Geochemical DataStatistical Analysis of Left-Censored Geochemical Data
Statistical Analysis of Left-Censored Geochemical Data
MSTomlinson
 

Viewers also liked (9)

Gene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialGene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -Tutorial
 
6 metabolite enrichment analysis
6  metabolite enrichment analysis6  metabolite enrichment analysis
6 metabolite enrichment analysis
 
5 data analysis case study
5  data analysis case study5  data analysis case study
5 data analysis case study
 
2 cluster analysis
2  cluster analysis2  cluster analysis
2 cluster analysis
 
Introduction to Network Mapping
Introduction to Network MappingIntroduction to Network Mapping
Introduction to Network Mapping
 
A System for Denial of Service Attack Detection Based On Multivariate Corelat...
A System for Denial of Service Attack Detection Based On Multivariate Corelat...A System for Denial of Service Attack Detection Based On Multivariate Corelat...
A System for Denial of Service Attack Detection Based On Multivariate Corelat...
 
a system for denial-of-service attack detection based on multivariate correla...
a system for denial-of-service attack detection based on multivariate correla...a system for denial-of-service attack detection based on multivariate correla...
a system for denial-of-service attack detection based on multivariate correla...
 
Functional And Pathway Analysis 2010
Functional And Pathway Analysis 2010Functional And Pathway Analysis 2010
Functional And Pathway Analysis 2010
 
Statistical Analysis of Left-Censored Geochemical Data
Statistical Analysis of Left-Censored Geochemical DataStatistical Analysis of Left-Censored Geochemical Data
Statistical Analysis of Left-Censored Geochemical Data
 

Similar to Prote-OMIC Data Analysis and Visualization

Advanced Strategies for Analysis of Metabolomic Data
Advanced Strategies for Analysis of Metabolomic DataAdvanced Strategies for Analysis of Metabolomic Data
Advanced Strategies for Analysis of Metabolomic Data
Dmitry Grapov
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..
butest
 
American Society for Mass Spectrometry Conference 2013
American Society for Mass Spectrometry Conference 2013American Society for Mass Spectrometry Conference 2013
American Society for Mass Spectrometry Conference 2013
Dmitry Grapov
 
Research Statement Chien-Wei Lin
Research Statement Chien-Wei LinResearch Statement Chien-Wei Lin
Research Statement Chien-Wei Lin
Chien-Wei Lin
 

Similar to Prote-OMIC Data Analysis and Visualization (20)

Multivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataMultivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic Data
 
Slides sem on pls-complete
Slides sem on pls-completeSlides sem on pls-complete
Slides sem on pls-complete
 
Advanced Strategies for Analysis of Metabolomic Data
Advanced Strategies for Analysis of Metabolomic DataAdvanced Strategies for Analysis of Metabolomic Data
Advanced Strategies for Analysis of Metabolomic Data
 
Are we really including all relevant evidence
Are we really including all relevant evidence Are we really including all relevant evidence
Are we really including all relevant evidence
 
Data analysis
Data analysisData analysis
Data analysis
 
Systems genetics approaches to understand complex traits
Systems genetics approaches to understand complex traitsSystems genetics approaches to understand complex traits
Systems genetics approaches to understand complex traits
 
Proteomics - Analysis and integration of large-scale data sets
Proteomics - Analysis and integration of large-scale data setsProteomics - Analysis and integration of large-scale data sets
Proteomics - Analysis and integration of large-scale data sets
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Recommender system
Recommender systemRecommender system
Recommender system
 
Review of "Survey Research Methods & Design in Psychology"
Review of "Survey Research Methods & Design in Psychology"Review of "Survey Research Methods & Design in Psychology"
Review of "Survey Research Methods & Design in Psychology"
 
Statistical analysis and interpretation
Statistical analysis and interpretationStatistical analysis and interpretation
Statistical analysis and interpretation
 
ReComp for genomics
ReComp for genomicsReComp for genomics
ReComp for genomics
 
Ijricit 01-002 enhanced replica detection in short time for large data sets
Ijricit 01-002 enhanced replica detection in  short time for large data setsIjricit 01-002 enhanced replica detection in  short time for large data sets
Ijricit 01-002 enhanced replica detection in short time for large data sets
 
Pathway and network analysis
Pathway and network analysisPathway and network analysis
Pathway and network analysis
 
American Society for Mass Spectrometry Conference 2013
American Society for Mass Spectrometry Conference 2013American Society for Mass Spectrometry Conference 2013
American Society for Mass Spectrometry Conference 2013
 
Boosting probabilistic graphical model inference by incorporating prior knowl...
Boosting probabilistic graphical model inference by incorporating prior knowl...Boosting probabilistic graphical model inference by incorporating prior knowl...
Boosting probabilistic graphical model inference by incorporating prior knowl...
 
Data analytics in computer networking
Data analytics in computer networkingData analytics in computer networking
Data analytics in computer networking
 
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
 
Research Statement Chien-Wei Lin
Research Statement Chien-Wei LinResearch Statement Chien-Wei Lin
Research Statement Chien-Wei Lin
 

More from Dmitry Grapov

Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Dmitry Grapov
 

More from Dmitry Grapov (6)

R programming for Data Science - A Beginner’s Guide
R programming for Data Science - A Beginner’s GuideR programming for Data Science - A Beginner’s Guide
R programming for Data Science - A Beginner’s Guide
 
Network mapping 101 course
Network mapping 101 courseNetwork mapping 101 course
Network mapping 101 course
 
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
 
Dmitry Grapov Resume and CV
Dmitry Grapov Resume and CVDmitry Grapov Resume and CV
Dmitry Grapov Resume and CV
 
Modeling poster
Modeling posterModeling poster
Modeling poster
 
American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014
 

Recently uploaded

The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 

Recently uploaded (20)

FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai YoungDubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 

Prote-OMIC Data Analysis and Visualization

  • 1. Dmitry Grapov, PhD Multivariate Analysis and Visualization of ProteOmic Data
  • 2. State of the art facility producing massive amounts of biological data… >20-30K samples/yr >200 studies
  • 3. Analysis at the ProteOmic Scale and Beyond Genomic Proteomic Metabolomic Multi-OmicOmic integration
  • 4. Sample Variable Data Analysis and Visualization Quality Assessment • use replicated mesurements and/or internal standards to estimate analytical variance Statistical and Multivariate • use the experimental design to test hypotheses and/or identify trends in analytes Functional • use statistical and multivariate results to identify impacted biochemical domains Network • integrate statistical and multivariate results with the experimental design and analyte metadata experimental design - organism, sex, age etc. analyte description and metadata - biochemical class, mass spectra, etc. VariableSample
  • 5. Sample Variable Data Analysis and Visualization Quality Assessment • use replicated mesurements and/or internal standards to estimate analytical variance Statistical and Multivariate • use the experimental design to test hypotheses and/or identify trends in analytes Functional • use statistical and multivariate results to identify impacted biochemical domains Network • integrate statistical and multivariate results with the experimental design and analyte metadata Network Mapping experimental design - organism, sex, age etc. analyte description and metadata - biochemical class, mass spectra, etc. VariableSample
  • 6. Data Quality Assessment Quality metrics •Precision (replicated measurements) •Accuracy (reference samples) Common tasks •normalization •outlier detection •missing values imputation
  • 7. Principal Component Analysis (PCA) of all analytes, showing QC sample scores Batch Effects Drift in >400 replicated measurements across >100 analytical batches for a single analyte Acquisition batch Abundance QCs embedded among >5,5000 samples (1:10) collected over 1.5 yrs If the biological effect size is less than the analytical variance then the experiment will incorrectly yield insignificant results
  • 8. Analyte specific data quality overview Sample specific normalization can be used to estimate and remove analytical variance Raw Data Normalized Data Normalizations need to be numerically and visually validated log mean low precision %RSD high precision Samples QCs Batch Effects
  • 9. Outlier Detection • 1 variable (univariate) • 2 variables (bivariate) • >2 variables (multivariate)
  • 10. bivariate vs. multivariate mixed up samples outliers? (scatter plot) (PCA scores plot) Outlier Detection
  • 11. Network Mapping Ranked statistically significant differences within a a biochemical context Statistics Multivariate Context + + = Statistical and Multivariate Analyses Group 1 Group 2 What analytes are different between the two groups of samples? Statistical significant differences lacking rank and context t-Test Multivariate ranked differences lacking significance and context O-PLS-DA
  • 12. Network Mapping Statistics Multivariate Context + + = Statistical and Multivariate Analyses Group 1 Group 2 What analytes are different between the two groups of samples? Statistical t-Test Multivariate O-PLS-DA To see the big picture it is necessary too view the data from multiple different angles
  • 13. Statistical Analysis: achieving ‘significance’ significance level (α) and power (1-β ) effect size (standardized difference in means) sample size (n) Power analyses can be used to optimize future experiments given preliminary data Example: use experimentally derived (or literature estimated) effect sizes, desired p-value (alpha) and power (beta) to calculate the optimal number of samples per group
  • 14. Statistical Tests • Should be chosen based on the distribution (shape, type) of the (e.g. normal, negative binomial, Poisson) • Can be optimized based on data pre- treatment (e.g. NSAF, Power Law Global Error Model, PLGEM) Poisson normal
  • 15. False Discovery Rate (FDR) Type I Error: False Positives (α) •Type II Error: False Negatives (β) •Type I risk = •1-(1-p.value)m m = number of variables tested
  • 16. False Discovery Rate Adjustment FDRadjustedp-value p-value Benjamini & Hochberg (1995) (“BH”) •Accepted standard Bonferroni •Very conservative •adjusted p-value = p-value x # of tests (e.g. 0.005 x 148 = 0.74 )
  • 17. Functional Analysis Nucl. Acids Res. (2008) 36 (suppl 2): W423-W426.doi: 10.1093/nar/gkn282 Identify changes or enrichment in biochemical domains • decrease • increase
  • 18. Functional Analysis: Enrichment Biochemical Pathway Biochemical Ontology
  • 20. Artist: Chuck Close Cluster Analysis Useful for •pattern recognition •complexity reduction Common Methods •Hierarchical •Model based •Other (k-means, k-NN, PAM, fuzzy) Linkage k-means Distribution Density
  • 22. Projection Methods The algorithm defines the position of the light source Principal Components Analysis (PCA) • unsupervised • maximize variance (X) Partial Least Squares Projection to Latent Structures (PLS) • supervised • maximize covariance (Y ~ X) James X. Li, 2009, VisuMap Tech. single analyte all analytes
  • 23. Interpreting scores and loadings variables with the highest loadings have the greatest contribution to sample scores loadings represent how variables contribute to sample scores loadings Scores represent dis/similarities in samples based on all variables scores
  • 25. Enrichment Network Mapping of parents through children
  • 27. Empirical Networks • Correlation based networks (CN) (simple, tendency to hairball) • GGM or partial correlation based networks (advanced, preference of direct over indirect relationships • *Increase in robustness with sample size 10.1007/978-1-4614-1689-0_17
  • 28. Proteomic Case Study: Diabetes Markers • Small sample size (control =12, GDM =6); covariates (time of sample collection) • >600 measured colostrum proteins; ~ 300 NSAF normalized proteins retained • Multivariate classification with O-PLS-DA used to identify variables to test using PLGEM with correction for FDR • Partial-correlation protein-protein interaction network analysis
  • 30. DeviumWeb • visualization • statistics • clustering • PCA • O-PLS https://github.com/dgrapov/DeviumWeb
  • 31. Software and Resources •DeviumWeb- Dynamic multivariate data analysis and visualization platform url: https://github.com/dgrapov/DeviumWeb •imDEV- Microsoft Excel add-in for multivariate analysis url: http://sourceforge.net/projects/imdev/ •MetaMapR- Network analysis tools for metabolomics url: https://github.com/dgrapov/MetaMapR •TeachingDemos- Tutorials and demonstrations •url: http://sourceforge.net/projects/teachingdemos/?source=directory •url: https://github.com/dgrapov/TeachingDemos •Data analysis case studies and Examples url: http://imdevsoftware.wordpress.com/
  • 32. Questions? dgrapov@ucdavis.edu This research was supported in part by NIH 1 U24 DK097154