Full course: https://creativedatasolutions.github.io/CDS.courses/courses/network_mapping_101/docs/
The course covered all of the steps required to go from `raw data` to a rich `mapped biochemical network` incorporating statistical, multivariate and machine learning results. This included [examples](https://creativedatasolutions.github.io/CDS.courses/courses/network_mapping_101/docs/#topics) and tutorials for:
* Preparing raw data for analysis
* Multivariate data exploration
* Supervised clustering
* Machine learning – classification model validation and feature selection
* Network analysis - biochemical, structural similarity and correlation networks
* Network mapping – putting it all together to create a publication quality network
url:
https://github.com/CreativeDataSolutions/CDS.courses/blob/gh-pages/courses/network_mapping_101/materials/lectures/tutorial.pdf
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...Dmitry Grapov
Machine learning (ML) is being ubiquitously incorporated into everyday products such as Internet search, email spam filters, product recommendations, image classification, and speech recognition. New approaches for highly integrated manufacturing and automation such as the Industry 4.0 and the Internet of things are also converging with ML methodologies. Many approaches incorporate complex artificial neural network architectures and are collectively referred to as deep learning (DL) applications. These methods have been shown capable of representing and learning predictable relationships in many diverse forms of data and hold promise for transforming the future of omics research and applications in precision medicine. Omics and electronic health record data pose considerable challenges for DL. This is due to many factors such as low signal to noise, analytical variance, and complex data integration requirements. However, DL models have already been shown capable of both improving the ease of data encoding and predictive model performance over alternative approaches. It may not be surprising that concepts encountered in DL share similarities with those observed in biological message relay systems such as gene, protein, and metabolite networks. This expert review examines the challenges and opportunities for DL at a systems and biological scale for a precision medicine readership.
Dmitry Grapov is a data science leader seeking opportunities to develop teams using predictive modeling, machine learning, and data visualization. He has over 10 years of experience in data science, bioinformatics, and software development for applications in genomics, metabolomics, and personalized medicine. Grapov has a Ph.D. in Analytical Chemistry from the University of California, Davis and expertise in machine learning, comparative genomics, metagenomics, and mass spectrometry.
https://www.youtube.com/watch?v=Y_-o-4rKxUk
Machine learning powered metabolomic network analysis
Dmitry Grapov PhD,
Director of Data Science and Bioinformatics,
CDS- Creative Data Solutions
www.createdatasol.com
Metabolomic network analysis can be used to interpret experimental results within a variety of contexts including: biochemical relationships, structural and spectral similarity and empirical correlation. Machine learning is useful for modeling relationships in the context of pattern recognition, clustering, classification and regression based predictive modeling. The combination of developed metabolomic networks and machine learning based predictive models offer a unique method to visualize empirical relationships while testing key experimental hypotheses. The following presentation focuses on data analysis, visualization, machine learning and network mapping approaches used to create richly mapped metabolomic networks. Learn more at www.createdatasol.com
Complex Systems Biology Informed Data Analysis and Machine LearningDmitry Grapov
Dmitry Grapov is a data scientist and principal statistician at the NIH West Coast Metabolomics Center. He received his PhD in analytical chemistry from the University of California, Davis and has applied complex systems biology, data analysis, and machine learning techniques to problems in predictive modeling, biomarker discovery, and personalized medicine. He has developed software tools like DeviumWeb and MetaMapR to integrate multi-omic datasets and build biochemical networks for applications in systems biology and wellness optimization.
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses Dmitry Grapov
Dr. Dmitry Grapov gave a webinar on challenges and strategies for next-generation omics analyses. He discussed how large, longitudinal studies integrating multiple omics domains are needed to identify small biological effects. Data normalization strategies must be considered during experimental design to remove analytical batch effects. Quality control-based normalization using analytical replicates can estimate and remove analytical variance from large datasets. Integrating multiple measurement platforms is often required to identify systems of biological changes. Network-based analysis of omics data can help explain more phenotypic variance than single omics approaches alone. Dr. Grapov demonstrated software tools he developed for network analysis, visualization, and integration of multi-omics datasets.
Full course: https://creativedatasolutions.github.io/CDS.courses/courses/network_mapping_101/docs/
The course covered all of the steps required to go from `raw data` to a rich `mapped biochemical network` incorporating statistical, multivariate and machine learning results. This included [examples](https://creativedatasolutions.github.io/CDS.courses/courses/network_mapping_101/docs/#topics) and tutorials for:
* Preparing raw data for analysis
* Multivariate data exploration
* Supervised clustering
* Machine learning – classification model validation and feature selection
* Network analysis - biochemical, structural similarity and correlation networks
* Network mapping – putting it all together to create a publication quality network
url:
https://github.com/CreativeDataSolutions/CDS.courses/blob/gh-pages/courses/network_mapping_101/materials/lectures/tutorial.pdf
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...Dmitry Grapov
Machine learning (ML) is being ubiquitously incorporated into everyday products such as Internet search, email spam filters, product recommendations, image classification, and speech recognition. New approaches for highly integrated manufacturing and automation such as the Industry 4.0 and the Internet of things are also converging with ML methodologies. Many approaches incorporate complex artificial neural network architectures and are collectively referred to as deep learning (DL) applications. These methods have been shown capable of representing and learning predictable relationships in many diverse forms of data and hold promise for transforming the future of omics research and applications in precision medicine. Omics and electronic health record data pose considerable challenges for DL. This is due to many factors such as low signal to noise, analytical variance, and complex data integration requirements. However, DL models have already been shown capable of both improving the ease of data encoding and predictive model performance over alternative approaches. It may not be surprising that concepts encountered in DL share similarities with those observed in biological message relay systems such as gene, protein, and metabolite networks. This expert review examines the challenges and opportunities for DL at a systems and biological scale for a precision medicine readership.
Dmitry Grapov is a data science leader seeking opportunities to develop teams using predictive modeling, machine learning, and data visualization. He has over 10 years of experience in data science, bioinformatics, and software development for applications in genomics, metabolomics, and personalized medicine. Grapov has a Ph.D. in Analytical Chemistry from the University of California, Davis and expertise in machine learning, comparative genomics, metagenomics, and mass spectrometry.
https://www.youtube.com/watch?v=Y_-o-4rKxUk
Machine learning powered metabolomic network analysis
Dmitry Grapov PhD,
Director of Data Science and Bioinformatics,
CDS- Creative Data Solutions
www.createdatasol.com
Metabolomic network analysis can be used to interpret experimental results within a variety of contexts including: biochemical relationships, structural and spectral similarity and empirical correlation. Machine learning is useful for modeling relationships in the context of pattern recognition, clustering, classification and regression based predictive modeling. The combination of developed metabolomic networks and machine learning based predictive models offer a unique method to visualize empirical relationships while testing key experimental hypotheses. The following presentation focuses on data analysis, visualization, machine learning and network mapping approaches used to create richly mapped metabolomic networks. Learn more at www.createdatasol.com
Complex Systems Biology Informed Data Analysis and Machine LearningDmitry Grapov
Dmitry Grapov is a data scientist and principal statistician at the NIH West Coast Metabolomics Center. He received his PhD in analytical chemistry from the University of California, Davis and has applied complex systems biology, data analysis, and machine learning techniques to problems in predictive modeling, biomarker discovery, and personalized medicine. He has developed software tools like DeviumWeb and MetaMapR to integrate multi-omic datasets and build biochemical networks for applications in systems biology and wellness optimization.
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses Dmitry Grapov
Dr. Dmitry Grapov gave a webinar on challenges and strategies for next-generation omics analyses. He discussed how large, longitudinal studies integrating multiple omics domains are needed to identify small biological effects. Data normalization strategies must be considered during experimental design to remove analytical batch effects. Quality control-based normalization using analytical replicates can estimate and remove analytical variance from large datasets. Integrating multiple measurement platforms is often required to identify systems of biological changes. Network-based analysis of omics data can help explain more phenotypic variance than single omics approaches alone. Dr. Grapov demonstrated software tools he developed for network analysis, visualization, and integration of multi-omics datasets.
Case Study: Overview of Metabolomic Data Normalization StrategiesDmitry Grapov
Five normalization methods were compared, of which the combination of qc-LOESS and cubic splines showed the best performance based on within-batch and between-batch variable relative standard deviations for QCs. This approach was used to normalize sample measurements the results of which were analyzed using principal components analysis.
This document summarizes a study that used multi-omic profiling to identify metabolic perturbations in type 1 diabetic (T1D) mice compared to non-diabetic mice. The study found: 1) Increased markers of oxidative stress and reduced levels of anti-inflammatory lipids in T1D mice; 2) Elevated triglycerides and reductions in major structural lipids in T1D mice, indicating hypertriglyceridemia; 3) Over 1000 plasma metabolites were measured and biochemical network analysis identified differences between T1D and non-T1D mice related to oxidative stress, inflammation, and lipid metabolism.
3 data normalization (2014 lab tutorial)Dmitry Grapov
Get more information:
http://imdevsoftware.wordpress.com/2014/10/11/2014-metabolomic-data-analysis-and-visualization-workshop-and-tutorials/
Recently I had the pleasure of teaching statistical and multivariate data analysis and visualization at the annual Summer Sessions in Metabolomics 2014, organized by the NIH West Coast Metabolomics Center.
Metabolomic Data Analysis Workshop and Tutorials (2014)Dmitry Grapov
This document provides an introduction and overview of tutorials for metabolomic data analysis. It discusses downloading required files and software. The goals of the analysis include using statistical and multivariate analyses to identify differences between sample groups and impacted biochemical domains. It also discusses various data analysis techniques including data quality assessment, univariate and multivariate statistical analyses, clustering, principal component analysis, partial least squares modeling, functional enrichment analysis, and network mapping.
Normalization of Large-Scale Metabolomic Studies 2014Dmitry Grapov
This document discusses approaches for normalizing large-scale metabolomics data to minimize analytical variance and remove non-biological artifacts. It describes common normalization methods like analytical standards, quality control-based normalization using LOESS or batch ratios, and variance stabilizing transformations. The document also presents two case studies on normalizing over 5,500 metabolomics samples from the TEDDY study using different normalization approaches like LOESS, batch ratio, qcISTD, and their combinations to minimize analytical variance from over 100 batches and better reveal true biological trends.
Step by step tutorial for conducting GO enrichment analysis and then creating a network from the results.
Material from the UC Davis 2014 Proteomics Workshop.
See more at: http://sourceforge.net/projects/teachingdemos/files/2014%20UC%20Davis%20Proteomics%20Workshop/
Prote-OMIC Data Analysis and VisualizationDmitry Grapov
Introductory lecture to multivariate analysis of proteomic data.
Material from the UC Davis 2014 Proteomics Workshop.
See more at: http://sourceforge.net/projects/teachingdemos/files/2014%20UC%20Davis%20Proteomics%20Workshop/
Data Normalization Approaches for Large-scale Biological StudiesDmitry Grapov
Overview of how to estimate data quality and validate normalization approaches to remove analytical variance.
See here for animations used in the presentation:
http://imdevsoftware.wordpress.com/2014/06/04/using-repeated-measures-to-remove-artifacts-from-longitudinal-data/
Automation of (Biological) Data Analysis and Report GenerationDmitry Grapov
I've been experimenting with automating simple and complex data analysis and report generation tasks for biological data and mostly using R and LATEX. You can see some of my progress and challenges encountered.
Metabolomic data analysis and visualization toolsDmitry Grapov
This document discusses tools and methods for metabolomic data analysis and visualization. It covers visualization techniques like plots and networks to explore patterns in data. It also discusses statistical analysis methods like ANOVA and clustering for significance testing and pattern detection. Additionally, it discusses predictive modeling, network analysis using pathways, and network mapping to relate metabolites based on biochemical transformations, structural similarity, or empirical dependencies. Common analysis tasks and featured open-source tools are also highlighted.
High Dimensional Biological Data Analysis and VisualizationDmitry Grapov
This document discusses metabolomic data analysis techniques for studying diseases. It analyzes over 13,000 biological samples per year using over 160,000 data points per study. Univariate and multivariate statistical analyses are described, with multivariate being preferred. Techniques include principal component analysis, partial least squares discriminant analysis, hierarchical clustering analysis, and pathway enrichment analysis. Visualization and network mapping tools are also discussed to identify relationships between altered metabolites and treatment effects.
This document discusses using various bioinformatics tools and databases to conduct pathway enrichment analysis on metabolite data from pumpkin and tomatillo leaves. It describes using the KEGG database to visualize pathways, MBRole to perform over-representation analysis using a hypergeometric test, and MetaboAnalyst to perform pathway enrichment analysis incorporating pathway topology. The goal is to identify significantly over-represented biological pathways and map metabolites of interest to pathways to understand biochemical differences between the plant leaves.
This document summarizes an analysis comparing the primary leaf metabolites of pumpkin and tomatillo plants. The goal was to carry out statistical analyses, hierarchical cluster analysis (HCA), principal component analysis (PCA) and orthogonal partial least squares discriminant analysis (O-PLS-DA) on metabolite profile data from pumpkin and tomatillo leaf samples. Both the HCA and PCA suggested that the treatment effect on metabolite profiles was minor compared to differences between species. A PLS-DA model was validated and found to have outstanding performance in discriminating between pumpkin and tomatillo leaf metabolites. Top discriminating metabolites between the species were then identified.
This document describes using partial least squares discriminant analysis (PLS-DA) to identify metabolites that best discriminate between different sample processing methods using metabolomic data from pumpkin samples. It discusses modeling strategies including model selection, results visualization, feature selection, and validation. Key steps involve building PLS models to discriminate extraction and treatment groups, evaluating scores and loadings plots, and identifying the top discriminating variables between extraction methods based on their importance in the models.
This document discusses using principal component analysis (PCA) to analyze metabolomic sample data from pumpkin experiments. PCA was performed on the raw data and scaled data to identify major sources of variance. For the raw data, the first two principal components captured most of the variance and separated samples by extraction method and treatment. Several samples were identified as potential outliers. When PCA was done on autoscaled data, the loadings showed differences due to both extraction and treatment. The scaled analysis also identified some outlier samples.
The document discusses using hierarchical cluster analysis (HCA) to evaluate metabolomic sample processing methods. It describes two goals: 1) Use HCA to cluster samples based on raw data similarities and correlations to determine the impact of extraction and treatment methods on data variance. Extraction had the greatest effect, with ACN:/IPA/water and MeOH/CH3Cl/water samples most similar. 2) Use HCA to cluster metabolites based on z-scaled data and correlations to identify groups of related metabolites and evaluate the robustness of different correlation measures. Clusters extracted from the correlation-based dendrogram contained metabolites that shared biological functions.
Si la baisse de la productivité est effective dans toutes les économies développées... elle est particulièrement marquée en France. Au niveau national, cet essoufflement touche tous les secteurs, et plus particulièrement celui de l’industrie, usuellement caractérisé par des gains de productivité élevés. Depuis la crise Covid, le secteur industriel contribue pour 35 % environ à cette perte, alors qu’il ne représente que 9,3 % de la valeur ajoutée nationale brute en 2023. Dans ce contexte, est-il possible de mener une politique de réindustrialisation du pays sans y associer un objectif de hausse des gains de productivité ?Non rappelle ce Cube. Au contraire, ces deux objectifs, jusqu’alors indépendants l’un de l’autre, sont désormais deux défis à relever conjointement. En analysant les différents explications à la baisse de celle-ci observée en France et dans les autres économies développées, ce Cube suggère que l’augmenter en parallèle d’une politique de réindustrialisation sous-entend une réallocation des facteurs de production vers les entreprises industrielles à fort potentiel. Elle suppose également une une meilleure affectation des ressources.
Case Study: Overview of Metabolomic Data Normalization StrategiesDmitry Grapov
Five normalization methods were compared, of which the combination of qc-LOESS and cubic splines showed the best performance based on within-batch and between-batch variable relative standard deviations for QCs. This approach was used to normalize sample measurements the results of which were analyzed using principal components analysis.
This document summarizes a study that used multi-omic profiling to identify metabolic perturbations in type 1 diabetic (T1D) mice compared to non-diabetic mice. The study found: 1) Increased markers of oxidative stress and reduced levels of anti-inflammatory lipids in T1D mice; 2) Elevated triglycerides and reductions in major structural lipids in T1D mice, indicating hypertriglyceridemia; 3) Over 1000 plasma metabolites were measured and biochemical network analysis identified differences between T1D and non-T1D mice related to oxidative stress, inflammation, and lipid metabolism.
3 data normalization (2014 lab tutorial)Dmitry Grapov
Get more information:
http://imdevsoftware.wordpress.com/2014/10/11/2014-metabolomic-data-analysis-and-visualization-workshop-and-tutorials/
Recently I had the pleasure of teaching statistical and multivariate data analysis and visualization at the annual Summer Sessions in Metabolomics 2014, organized by the NIH West Coast Metabolomics Center.
Metabolomic Data Analysis Workshop and Tutorials (2014)Dmitry Grapov
This document provides an introduction and overview of tutorials for metabolomic data analysis. It discusses downloading required files and software. The goals of the analysis include using statistical and multivariate analyses to identify differences between sample groups and impacted biochemical domains. It also discusses various data analysis techniques including data quality assessment, univariate and multivariate statistical analyses, clustering, principal component analysis, partial least squares modeling, functional enrichment analysis, and network mapping.
Normalization of Large-Scale Metabolomic Studies 2014Dmitry Grapov
This document discusses approaches for normalizing large-scale metabolomics data to minimize analytical variance and remove non-biological artifacts. It describes common normalization methods like analytical standards, quality control-based normalization using LOESS or batch ratios, and variance stabilizing transformations. The document also presents two case studies on normalizing over 5,500 metabolomics samples from the TEDDY study using different normalization approaches like LOESS, batch ratio, qcISTD, and their combinations to minimize analytical variance from over 100 batches and better reveal true biological trends.
Step by step tutorial for conducting GO enrichment analysis and then creating a network from the results.
Material from the UC Davis 2014 Proteomics Workshop.
See more at: http://sourceforge.net/projects/teachingdemos/files/2014%20UC%20Davis%20Proteomics%20Workshop/
Prote-OMIC Data Analysis and VisualizationDmitry Grapov
Introductory lecture to multivariate analysis of proteomic data.
Material from the UC Davis 2014 Proteomics Workshop.
See more at: http://sourceforge.net/projects/teachingdemos/files/2014%20UC%20Davis%20Proteomics%20Workshop/
Data Normalization Approaches for Large-scale Biological StudiesDmitry Grapov
Overview of how to estimate data quality and validate normalization approaches to remove analytical variance.
See here for animations used in the presentation:
http://imdevsoftware.wordpress.com/2014/06/04/using-repeated-measures-to-remove-artifacts-from-longitudinal-data/
Automation of (Biological) Data Analysis and Report GenerationDmitry Grapov
I've been experimenting with automating simple and complex data analysis and report generation tasks for biological data and mostly using R and LATEX. You can see some of my progress and challenges encountered.
Metabolomic data analysis and visualization toolsDmitry Grapov
This document discusses tools and methods for metabolomic data analysis and visualization. It covers visualization techniques like plots and networks to explore patterns in data. It also discusses statistical analysis methods like ANOVA and clustering for significance testing and pattern detection. Additionally, it discusses predictive modeling, network analysis using pathways, and network mapping to relate metabolites based on biochemical transformations, structural similarity, or empirical dependencies. Common analysis tasks and featured open-source tools are also highlighted.
High Dimensional Biological Data Analysis and VisualizationDmitry Grapov
This document discusses metabolomic data analysis techniques for studying diseases. It analyzes over 13,000 biological samples per year using over 160,000 data points per study. Univariate and multivariate statistical analyses are described, with multivariate being preferred. Techniques include principal component analysis, partial least squares discriminant analysis, hierarchical clustering analysis, and pathway enrichment analysis. Visualization and network mapping tools are also discussed to identify relationships between altered metabolites and treatment effects.
This document discusses using various bioinformatics tools and databases to conduct pathway enrichment analysis on metabolite data from pumpkin and tomatillo leaves. It describes using the KEGG database to visualize pathways, MBRole to perform over-representation analysis using a hypergeometric test, and MetaboAnalyst to perform pathway enrichment analysis incorporating pathway topology. The goal is to identify significantly over-represented biological pathways and map metabolites of interest to pathways to understand biochemical differences between the plant leaves.
This document summarizes an analysis comparing the primary leaf metabolites of pumpkin and tomatillo plants. The goal was to carry out statistical analyses, hierarchical cluster analysis (HCA), principal component analysis (PCA) and orthogonal partial least squares discriminant analysis (O-PLS-DA) on metabolite profile data from pumpkin and tomatillo leaf samples. Both the HCA and PCA suggested that the treatment effect on metabolite profiles was minor compared to differences between species. A PLS-DA model was validated and found to have outstanding performance in discriminating between pumpkin and tomatillo leaf metabolites. Top discriminating metabolites between the species were then identified.
This document describes using partial least squares discriminant analysis (PLS-DA) to identify metabolites that best discriminate between different sample processing methods using metabolomic data from pumpkin samples. It discusses modeling strategies including model selection, results visualization, feature selection, and validation. Key steps involve building PLS models to discriminate extraction and treatment groups, evaluating scores and loadings plots, and identifying the top discriminating variables between extraction methods based on their importance in the models.
This document discusses using principal component analysis (PCA) to analyze metabolomic sample data from pumpkin experiments. PCA was performed on the raw data and scaled data to identify major sources of variance. For the raw data, the first two principal components captured most of the variance and separated samples by extraction method and treatment. Several samples were identified as potential outliers. When PCA was done on autoscaled data, the loadings showed differences due to both extraction and treatment. The scaled analysis also identified some outlier samples.
The document discusses using hierarchical cluster analysis (HCA) to evaluate metabolomic sample processing methods. It describes two goals: 1) Use HCA to cluster samples based on raw data similarities and correlations to determine the impact of extraction and treatment methods on data variance. Extraction had the greatest effect, with ACN:/IPA/water and MeOH/CH3Cl/water samples most similar. 2) Use HCA to cluster metabolites based on z-scaled data and correlations to identify groups of related metabolites and evaluate the robustness of different correlation measures. Clusters extracted from the correlation-based dendrogram contained metabolites that shared biological functions.
Si la baisse de la productivité est effective dans toutes les économies développées... elle est particulièrement marquée en France. Au niveau national, cet essoufflement touche tous les secteurs, et plus particulièrement celui de l’industrie, usuellement caractérisé par des gains de productivité élevés. Depuis la crise Covid, le secteur industriel contribue pour 35 % environ à cette perte, alors qu’il ne représente que 9,3 % de la valeur ajoutée nationale brute en 2023. Dans ce contexte, est-il possible de mener une politique de réindustrialisation du pays sans y associer un objectif de hausse des gains de productivité ?Non rappelle ce Cube. Au contraire, ces deux objectifs, jusqu’alors indépendants l’un de l’autre, sont désormais deux défis à relever conjointement. En analysant les différents explications à la baisse de celle-ci observée en France et dans les autres économies développées, ce Cube suggère que l’augmenter en parallèle d’une politique de réindustrialisation sous-entend une réallocation des facteurs de production vers les entreprises industrielles à fort potentiel. Elle suppose également une une meilleure affectation des ressources.
Dans un contexte où la transmission et l'installation d'agriculteurs sont des enjeux cruciaux pour la profession agricole, de nouveaux agriculteurs s'installent chaque année et, parmi eux, certains Bac+5 ou plus. Les cursus des écoles d'ingénieurs n'ont pas vocation à former de futurs agriculteurs. Pourtant, certains apprenants ayant suivi ces cursus BAC + 5, qu'ils soient ou non issus du milieu agricole, tentent l'aventure de l'entrepreneuriat agricole. Qui sont-ils ? Quelles sont leurs motivations et visions ? Comment travaillent-ils ?