SlideShare une entreprise Scribd logo
1  sur  44
Integrative analysis of transcriptomics and proteomics data (ArrayMining and TopoGSA) Integrative analysis of transcriptomics and proteomics data: implications to cancer biology ASAP – Interdisciplinary Optimisation Laboratory School of Computer Science Centre for Integrative Plant Biology Centre for Healthcare Associated Infections Institute of Infection, Immunity and Inflammation University of Nottingham Enrico Glaab & Natalio Krasnogor
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Outline Gibson G (2003) Microarray Analysis. PLoS Biol 1(1): e15. doi:10.1371/journal.pbio.0000015
Introduction ,[object Object],[object Object],[object Object]
Reference data set Armstrong et al. Leukemia data set ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],samples Heat map: 30 most differentially expressed genes vs. samples genes
Main data set QMC breast cancer microarray data set ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],grade1  grade 3 Heat map: 30 most differentially expressed genes vs. samples (grade 1 and grade 3) genes
Breast cancer data - difficulties Breast cancer outcome is hard to predict: Large degree of class-overlap in Breast cancer microarray data, whereas Leukemia decision boundaries are easy to find (Blazadonakis, 2009). Van‘t Veer et al.    Alon et al.   Golub et al.
Data Fusion Other biological data sources used: ,[object Object],   mutated genes in    different human    cancer types    (Breast, Liver,...)   30 gene sets of    size > 10 genes    obtained from GO,    BioCarta, Reactome,    KEGG and InterPro    total: approx. 3000    pathways (size > 10) ,[object Object],[object Object],Breast cancer microarray data : Protein interaction data : Cellular pathway data : Cancer gene sets :
Methods overview Methods overview: ArrayMining &  TopoGSA
Web-tool: ArrayMining.net What is ArrayMining.net?  ArrayMinining.net is an online microarray analysis tool set integrating multiple data sources and algorithms. 6 analysis modules: 1. Gene selection 2. Sample clustering 3. Sample classification 4. Gene Set Analysis 5. Gene Network Analysis 6. Cross-Study Normalization Goal : A “swiss knife“ for microarray analysis tasks classical new www.arraymining.org
Methods overview Methods overview: ArrayMining &  TopoGSA
ArrayMining.net: Gene selection ,[object Object],[object Object],[object Object],[object Object],   previously identified by Armstrong et al.    newly identified Affymetrix ID Gene symbol Gene descriptions – source:  F-statistic 32847_at   MYLK  myosin, light polypeptide kinase  159.59 1389_at   MME  membrane metallo-endopeptidase (neutral endopeptidase, enkephalinase)  137.53 35164_at   WFS1  wolfram syndrome 1 (wolframin)  128 36239_at   POU2AF1  pou domain, class 2, associating factor 1  116.75 1325_at   SMAD1  smad, mothers against dpp homolog 1 (drosophila)  110.37 963_at   LIG4  ligase iv, dna, atp-dependent  89.77 34168_at   DNTT  deoxynucleotidyltransferase, terminal  89.31 40570_at   FOXO1  forkhead box o1a (rhabdomyosarcoma)  86.89 33412_at   LGALS1  lectin, galactoside-binding, soluble, 1 (galectin 1)  81.31
ArrayMining.net: Gene selection ,[object Object],[object Object],[object Object],[object Object],[object Object],   
ArrayMining.net: Examples Further examples: Gene selection and Clustering module  Automatic generation of heatmaps and PCA Cluster plots (Armstrong et al. dataset) samples genes
ArrayMining.net: Examples Further examples: 3D-ICA and Co-Expression analysis  3D Independent Component Analysis plot (left) and the largest connected components from a gene co-expression network (right) for the Armstrong et al. dataset Sample space: Gene space: ALL AML MLL
ArrayMining.net: In-house data Heat map: 50 most significant genes  Box plot: 4 most significant genes Apply the tools on new data: QMC Breast cancer data Expression levels across 3 tumour grades: STK6 MYBL2 KIF2C  AURKb
ArrayMining.net: QMC dataset ,[object Object],[object Object],[object Object],Gene name PC (gene vs. outcome): Fold Change Q-value (Rank) ESTROGEN RECEPTOR 1 -0.75 0.16 1.6e-20 (1.) RAS-LIKE, ESTROGEN-REGULATED, GROWTH INHIBITOR -0.66 0.46 5.3e-14 (2.) WD REPEAT DOMAIN 19 -0.66 0.73 1.2e-13 (3.) CARBONIC ANHYDRASE XII -0.65 0.28 2.7e-13 (4.) ARP3 ACTIN-RELATED PROTEIN 3 HOMOLOG (YEAST) 0.64 1.37 9.6e-13 (5.) TETRATRICOPEPTIDE REPEAT DOMAIN 8 -0.63 0.82 2.2e-12 (6.) BREAST CANCER MEMBRANE PROTEIN 11 -0.62 0.24 7.1e-12 (7.)
Methods overview Methods overview: ArrayMining &  TopoGSA
ArrayMining.net: Example ,[object Object],[object Object],[object Object],[object Object],ArrayMining - Class Discovery Analysis module:
ArrayMining.net: Consensus clustering ArrayMining‘s consensus clustering approach: Clustering Agreement := No. of times pairs of samples are assigned to the    same cluster across all input clusterings Idea:   Reward objects in the same cluster, if they have a high agreement. Agreement matrix: A ij  := # agreements across    all clusterings for    samples i and j Fitness function:  := (max(A)+min(A))/2 Sample 1 Sample 2
[object Object],[object Object],[object Object],Simulated Annealing - variants Cauchy vs. Gaussian distribution
Clustering methods and validity indices ,[object Object],[object Object],[object Object],[object Object],Example: Silhouette width a(i) = avg. distance of obj(i) to all others   in the same cluster b(i) =  avg. distance of obj(i) to all    others in closest distinct cluster
Consensus clustering: example ,[object Object],[object Object],Example application: QMC breast cancer dataset low confidence (silhouette widths) best separation for two clusters
External validation Random model   Single clustering   Consensus Measure similarity of clusterings with the  rand index R : a, b, c and d are the #pairs of objects assigned to: - the same cluster in    both clusterings (a) - different clusters in    both clusterings (b) - the same cluster in    clustering 1/2 and    different clusters in    clustering 2/1 (c/d) - Corrected for chance:      adjusted rand index Reference clustering: 3 tumour grades (low, medium, high) Clustering results – external validation (tumour grades) 10000 random clusterings
Methods overview Methods overview: ArrayMining &  TopoGSA
ArrayMining.net: Gene set analysis samples pathways ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],(example: Van Andel institute cancer gene sets) Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Subramanian et al.  PNAS  October 25, 2005  vol. 102  no. 43  15545–15550
ArrayMining.net: Examples Gene Set Analysis module – example analysis Heat map for the Armstrong et al. dataset based on pathway meta-genes ,[object Object],[object Object]
Consensus clustering: example (2) ,[object Object],[object Object],Combine consensus clustering with gene set analysis ~3 times higher confidence better separation
External validation Single clustering   Consensus clustering   Consensus (PAM+SOTA) 10000 random clusterings
Interim Summary ,[object Object],[object Object],[object Object],[object Object],ArrayMining Integrative Clustering - Summary
Methods overview Methods overview: ArrayMining &  TopoGSA
TopoGSA TopoGSA : Network topological analysis of gene sets What is  TopoGSA ?  TopoGSA  is a web-application mapping gene sets onto a comprehensive human protein interaction network and analysing their network topological properties.  Two types of analysis: 1. Compare genes within a gene set:   e.g. up- vs. down-regulated genes 2. Compare a gene set against a   database of known gene sets   (e.g. KEGG, BioCarta, GO) www.infobiotics.net/ TopoGSA
TopoGSA  - Methods ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],TopoGSA  computes the following topological properties for an uploaded geneset and matched-size random gene sets:
KEGG-BRITE pathway colouring ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Mean node betweenness Mean clustering coefficient Mean shortest path length
ArrayMining     TopoGSA ,[object Object],[object Object],[object Object],[object Object]
Real-world application of tools sets ,[object Object],[object Object],[object Object],TMAs of invasive breast cancer show strong RERG expression
RERG Protein Expression VS BCSS & DMFI Kaplan Meier plot of RERG protein expression with respect to BCSS in ER +  U ER -  cohort  Kaplan Meier plot of RERG protein expression with respect to BCSS in ER +  only Without adjuvant treatment Without Tamoxifen treatment
Conclusions(I): Feature comparison with similar tools ArrayMining &  TopoGSA GEPAS  (Tarraga et al.) Expression Profiler (Kapushesky et al.) Pre-processing : Image analysis, single- and dimensionality reduction, gene name normalization, cross-study normalization ,  covariance-based filtering Pre-processing : Image analysis, missing value imputation, multiple single study normalization methods, dimensionality reduction,  ID converter Pre-processing : Image analysis, single study normalization, missing value imputation, dimensionality reduction, advanced data selection Analysis : Classification, Clustering, Gene selection, GSEA, PCA,  ICA, Co-expression analysis, PPI-topology analysis, Ensembles/Cons. Analysis : Classification, Clustering, Gene selection, GSEA, PCA,  CGH arrays, Tissue mining,Text mining, TF-binding site prediction Analysis : Clustering, Gene selection, PCA,  Co-expression analysis  (different from ArrayMining),  COA, Similarity search Usability/features : PDF-reports, sortable ranking tables,  data anno-tation, 2D/ 3D plots , e-mail notification, video tutorials Usability/features : special tree visualization  (Caat, SotaTree, Newick Trees), 2D plots, data annotation (Babelomics),  Usability/features : Excel export, XML queries,  2D plots, data annotation (GO, chromosome location)
Conclusions (2) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Outlook : PPI-based pathway-enlargement ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],black = pathway-nodes; red   blue  green  = nodes added based on different criteria ... ... ...
Pathway enlargment – added genes Example case: BioCarta BTG family proteins and cell cycle regulation Black: Original pathway nodes –  Green : Nodes added based on connectivity Added cancer gene
Pathway enlargment – Example 1 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Example: Alzheimer disease pathway
Pathway enlargment – Example 2 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Example: Interleukin signaling pathways
Pathway enlargment - conclusion ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Acknowledgements ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Contenu connexe

Tendances

Comparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andComparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning and
Alexander Decker
 
Towards Prediction of Platinum Treatment Response in Ovarian Cancer using Mac...
Towards Prediction of Platinum Treatment Response in Ovarian Cancer using Mac...Towards Prediction of Platinum Treatment Response in Ovarian Cancer using Mac...
Towards Prediction of Platinum Treatment Response in Ovarian Cancer using Mac...
Antoaneta Vladimirova
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
Abhishek Vatsa
 
AACR 2014 Abstract# 3730: A quick and cost effective 12-cell line panel assay...
AACR 2014 Abstract# 3730: A quick and cost effective 12-cell line panel assay...AACR 2014 Abstract# 3730: A quick and cost effective 12-cell line panel assay...
AACR 2014 Abstract# 3730: A quick and cost effective 12-cell line panel assay...
yuliamax
 

Tendances (20)

EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
 
Efficiency of Using Sequence Discovery for Polymorphism in DNA Sequence
Efficiency of Using Sequence Discovery for Polymorphism in DNA SequenceEfficiency of Using Sequence Discovery for Polymorphism in DNA Sequence
Efficiency of Using Sequence Discovery for Polymorphism in DNA Sequence
 
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
 
Comparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andComparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning and
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahu
 
Towards Prediction of Platinum Treatment Response in Ovarian Cancer using Mac...
Towards Prediction of Platinum Treatment Response in Ovarian Cancer using Mac...Towards Prediction of Platinum Treatment Response in Ovarian Cancer using Mac...
Towards Prediction of Platinum Treatment Response in Ovarian Cancer using Mac...
 
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MININGANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
 
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...
 
Outlier Modification and Gene Selection for Binary Cancer Classification usin...
Outlier Modification and Gene Selection for Binary Cancer Classification usin...Outlier Modification and Gene Selection for Binary Cancer Classification usin...
Outlier Modification and Gene Selection for Binary Cancer Classification usin...
 
Sample Work For Engineering Literature Review and Gap Identification
Sample Work For Engineering Literature Review and Gap IdentificationSample Work For Engineering Literature Review and Gap Identification
Sample Work For Engineering Literature Review and Gap Identification
 
Axiom® Genome-Wide AFR 1 Array World Array 3
Axiom®  Genome-Wide AFR 1 Array World Array 3Axiom®  Genome-Wide AFR 1 Array World Array 3
Axiom® Genome-Wide AFR 1 Array World Array 3
 
B45020308
B45020308B45020308
B45020308
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 
Define cancer treatment using knn and naive bayes algorithms
Define cancer treatment using knn and naive bayes algorithmsDefine cancer treatment using knn and naive bayes algorithms
Define cancer treatment using knn and naive bayes algorithms
 
NetBioSIG2013-KEYNOTE Stefan Schuster
NetBioSIG2013-KEYNOTE Stefan SchusterNetBioSIG2013-KEYNOTE Stefan Schuster
NetBioSIG2013-KEYNOTE Stefan Schuster
 
MTTlab flyer A4
MTTlab flyer A4MTTlab flyer A4
MTTlab flyer A4
 
NTU-2019
NTU-2019NTU-2019
NTU-2019
 
AACR 2014 Abstract# 3730: A quick and cost effective 12-cell line panel assay...
AACR 2014 Abstract# 3730: A quick and cost effective 12-cell line panel assay...AACR 2014 Abstract# 3730: A quick and cost effective 12-cell line panel assay...
AACR 2014 Abstract# 3730: A quick and cost effective 12-cell line panel assay...
 
Pathway based OMICs data classification
Pathway based OMICs data classificationPathway based OMICs data classification
Pathway based OMICs data classification
 
Mapping metabolites against pathway databases
Mapping metabolites against pathway databases Mapping metabolites against pathway databases
Mapping metabolites against pathway databases
 

En vedette

Gene interaction ppt
Gene interaction ppt Gene interaction ppt
Gene interaction ppt
Ritesh ranjan
 

En vedette (12)

Understanding the Genetic Landscape of Puccinia graminis f.sp. tritici From a...
Understanding the Genetic Landscape of Puccinia graminis f.sp. tritici From a...Understanding the Genetic Landscape of Puccinia graminis f.sp. tritici From a...
Understanding the Genetic Landscape of Puccinia graminis f.sp. tritici From a...
 
Transcriptomics and lexico-syntactic analysis
Transcriptomics and lexico-syntactic analysisTranscriptomics and lexico-syntactic analysis
Transcriptomics and lexico-syntactic analysis
 
2013 Wheat Stem Rust Outbreaks: Global Response
2013 Wheat Stem Rust Outbreaks: Global Response 2013 Wheat Stem Rust Outbreaks: Global Response
2013 Wheat Stem Rust Outbreaks: Global Response
 
Mapping of durable adult plant stem rust resistance to Ug99 and its races
Mapping of durable adult plant stem rust resistance to Ug99 and its racesMapping of durable adult plant stem rust resistance to Ug99 and its races
Mapping of durable adult plant stem rust resistance to Ug99 and its races
 
Puccinia fungus
Puccinia fungusPuccinia fungus
Puccinia fungus
 
Gene ineractions jb
Gene ineractions jbGene ineractions jb
Gene ineractions jb
 
human genetics and population genetics
human genetics and population geneticshuman genetics and population genetics
human genetics and population genetics
 
Interaction of genes for slide share
Interaction of genes for slide shareInteraction of genes for slide share
Interaction of genes for slide share
 
K-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsK-Means, its Variants and its Applications
K-Means, its Variants and its Applications
 
Application of Clustering in Data Science using Real-life Examples
Application of Clustering in Data Science using Real-life Examples Application of Clustering in Data Science using Real-life Examples
Application of Clustering in Data Science using Real-life Examples
 
Gene interaction ppt
Gene interaction ppt Gene interaction ppt
Gene interaction ppt
 
RNAseq Analysis
RNAseq AnalysisRNAseq Analysis
RNAseq Analysis
 

Similaire à Integrative analysis of transcriptomics and proteomics data with ArrayMining and TopoGSA

Survey and Evaluation of Methods for Tissue Classification
Survey and Evaluation of Methods for Tissue ClassificationSurvey and Evaluation of Methods for Tissue Classification
Survey and Evaluation of Methods for Tissue Classification
perfj
 
20100509 bioinformatics kapushesky_lecture05_0
20100509 bioinformatics kapushesky_lecture05_020100509 bioinformatics kapushesky_lecture05_0
20100509 bioinformatics kapushesky_lecture05_0
Computer Science Club
 
Genomica - Microarreglos de DNA
Genomica - Microarreglos de DNAGenomica - Microarreglos de DNA
Genomica - Microarreglos de DNA
Ulises Urzua
 
Design of an Intelligent System for Improving Classification of Cancer Diseases
Design of an Intelligent System for Improving Classification of Cancer DiseasesDesign of an Intelligent System for Improving Classification of Cancer Diseases
Design of an Intelligent System for Improving Classification of Cancer Diseases
Mohamed Loey
 
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Databricks
 
Comparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andComparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning and
Alexander Decker
 

Similaire à Integrative analysis of transcriptomics and proteomics data with ArrayMining and TopoGSA (20)

Multi-omics infrastructure and data for R/Bioconductor
Multi-omics infrastructure and data for R/BioconductorMulti-omics infrastructure and data for R/Bioconductor
Multi-omics infrastructure and data for R/Bioconductor
 
Effect of Feature Selection on Gene Expression Datasets Classification Accura...
Effect of Feature Selection on Gene Expression Datasets Classification Accura...Effect of Feature Selection on Gene Expression Datasets Classification Accura...
Effect of Feature Selection on Gene Expression Datasets Classification Accura...
 
2 md2016 annotation
2 md2016 annotation2 md2016 annotation
2 md2016 annotation
 
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
 
Computational approaches to the regulatory genomics of neurogenesis
Computational approaches to the regulatory genomics of neurogenesisComputational approaches to the regulatory genomics of neurogenesis
Computational approaches to the regulatory genomics of neurogenesis
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Survey and Evaluation of Methods for Tissue Classification
Survey and Evaluation of Methods for Tissue ClassificationSurvey and Evaluation of Methods for Tissue Classification
Survey and Evaluation of Methods for Tissue Classification
 
Clustering and Visualisation using R programming
Clustering and Visualisation using R programmingClustering and Visualisation using R programming
Clustering and Visualisation using R programming
 
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationVisual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
 
Gene Selection for Sample Classification in Microarray: Clustering Based Method
Gene Selection for Sample Classification in Microarray: Clustering Based MethodGene Selection for Sample Classification in Microarray: Clustering Based Method
Gene Selection for Sample Classification in Microarray: Clustering Based Method
 
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
 
20100509 bioinformatics kapushesky_lecture05_0
20100509 bioinformatics kapushesky_lecture05_020100509 bioinformatics kapushesky_lecture05_0
20100509 bioinformatics kapushesky_lecture05_0
 
Genomica - Microarreglos de DNA
Genomica - Microarreglos de DNAGenomica - Microarreglos de DNA
Genomica - Microarreglos de DNA
 
Evolution of Knowledge Discovery and Management
Evolution of Knowledge Discovery and Management Evolution of Knowledge Discovery and Management
Evolution of Knowledge Discovery and Management
 
Design of an Intelligent System for Improving Classification of Cancer Diseases
Design of an Intelligent System for Improving Classification of Cancer DiseasesDesign of an Intelligent System for Improving Classification of Cancer Diseases
Design of an Intelligent System for Improving Classification of Cancer Diseases
 
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
 
Comparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andComparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning and
 
Omics data integration for MSA | International Society for Clinical Biostatis...
Omics data integration for MSA | International Society for Clinical Biostatis...Omics data integration for MSA | International Society for Clinical Biostatis...
Omics data integration for MSA | International Society for Clinical Biostatis...
 
Prediction of transcription factor binding to DNA using rule induction methods
Prediction of transcription factor binding to DNA using rule induction methodsPrediction of transcription factor binding to DNA using rule induction methods
Prediction of transcription factor binding to DNA using rule induction methods
 
Bioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmmBioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmm
 

Plus de Natalio Krasnogor

HUMIES presentation: Evolutionary design of energy functions for protein str...
HUMIES presentation: Evolutionary design of energy functions  for protein str...HUMIES presentation: Evolutionary design of energy functions  for protein str...
HUMIES presentation: Evolutionary design of energy functions for protein str...
Natalio Krasnogor
 

Plus de Natalio Krasnogor (20)

Designing for Addressability, Bio-orthogonality and Abstraction Scalability a...
Designing for Addressability, Bio-orthogonality and Abstraction Scalability a...Designing for Addressability, Bio-orthogonality and Abstraction Scalability a...
Designing for Addressability, Bio-orthogonality and Abstraction Scalability a...
 
Pathology is being disrupted by Data Integration, AI & Blockchain
Pathology is being disrupted by Data Integration, AI & BlockchainPathology is being disrupted by Data Integration, AI & Blockchain
Pathology is being disrupted by Data Integration, AI & Blockchain
 
Biological Apps: Rapidly Converging Technologies for Living Information Proce...
Biological Apps: Rapidly Converging Technologies for Living Information Proce...Biological Apps: Rapidly Converging Technologies for Living Information Proce...
Biological Apps: Rapidly Converging Technologies for Living Information Proce...
 
DNA data-structure
DNA data-structureDNA data-structure
DNA data-structure
 
Advanced computationalsyntbio
Advanced computationalsyntbioAdvanced computationalsyntbio
Advanced computationalsyntbio
 
The Infobiotics workbench
The Infobiotics workbenchThe Infobiotics workbench
The Infobiotics workbench
 
Introduction to biocomputing
 Introduction to biocomputing Introduction to biocomputing
Introduction to biocomputing
 
Evolvability of Designs and Computation with Porphyrins-based Nano-tiles
Evolvability of Designs and Computation with Porphyrins-based Nano-tilesEvolvability of Designs and Computation with Porphyrins-based Nano-tiles
Evolvability of Designs and Computation with Porphyrins-based Nano-tiles
 
Plenary Speaker slides at the 2016 International Workshop on Biodesign Automa...
Plenary Speaker slides at the 2016 International Workshop on Biodesign Automa...Plenary Speaker slides at the 2016 International Workshop on Biodesign Automa...
Plenary Speaker slides at the 2016 International Workshop on Biodesign Automa...
 
An Unorthodox View on Memetic Algorithms
An Unorthodox View on Memetic AlgorithmsAn Unorthodox View on Memetic Algorithms
An Unorthodox View on Memetic Algorithms
 
Darwin’s Magic: Evolutionary Computation in Nanoscience, Bioinformatics and S...
Darwin’s Magic: Evolutionary Computation in Nanoscience, Bioinformatics and S...Darwin’s Magic: Evolutionary Computation in Nanoscience, Bioinformatics and S...
Darwin’s Magic: Evolutionary Computation in Nanoscience, Bioinformatics and S...
 
Towards a Rapid Model Prototyping Strategy for Systems & Synthetic Biology
Towards a Rapid Model Prototyping  Strategy for Systems & Synthetic BiologyTowards a Rapid Model Prototyping  Strategy for Systems & Synthetic Biology
Towards a Rapid Model Prototyping Strategy for Systems & Synthetic Biology
 
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
 
HUMIES presentation: Evolutionary design of energy functions for protein str...
HUMIES presentation: Evolutionary design of energy functions  for protein str...HUMIES presentation: Evolutionary design of energy functions  for protein str...
HUMIES presentation: Evolutionary design of energy functions for protein str...
 
Evolutionary Symbolic Discovery for Bioinformatics, Systems and Synthetic Bi...
Evolutionary Symbolic Discovery for Bioinformatics,  Systems and Synthetic Bi...Evolutionary Symbolic Discovery for Bioinformatics,  Systems and Synthetic Bi...
Evolutionary Symbolic Discovery for Bioinformatics, Systems and Synthetic Bi...
 
Computational Synthetic Biology
Computational Synthetic BiologyComputational Synthetic Biology
Computational Synthetic Biology
 
Synthetic Biology - Modeling and Optimisation
Synthetic Biology -  Modeling and OptimisationSynthetic Biology -  Modeling and Optimisation
Synthetic Biology - Modeling and Optimisation
 
A Genetic Programming Challenge: Evolving the Energy Function for Protein Str...
A Genetic Programming Challenge: Evolving the Energy Function for Protein Str...A Genetic Programming Challenge: Evolving the Energy Function for Protein Str...
A Genetic Programming Challenge: Evolving the Energy Function for Protein Str...
 
Building Executable Biology Models for Synthetic Biology
Building Executable Biology Models for Synthetic BiologyBuilding Executable Biology Models for Synthetic Biology
Building Executable Biology Models for Synthetic Biology
 
Extended Compact Genetic Algorithms and Learning Classifier Systems for Dimen...
Extended Compact Genetic Algorithms and Learning Classifier Systems for Dimen...Extended Compact Genetic Algorithms and Learning Classifier Systems for Dimen...
Extended Compact Genetic Algorithms and Learning Classifier Systems for Dimen...
 

Dernier

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Krashi Coaching
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 

Dernier (20)

Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 

Integrative analysis of transcriptomics and proteomics data with ArrayMining and TopoGSA

  • 1. Integrative analysis of transcriptomics and proteomics data (ArrayMining and TopoGSA) Integrative analysis of transcriptomics and proteomics data: implications to cancer biology ASAP – Interdisciplinary Optimisation Laboratory School of Computer Science Centre for Integrative Plant Biology Centre for Healthcare Associated Infections Institute of Infection, Immunity and Inflammation University of Nottingham Enrico Glaab & Natalio Krasnogor
  • 2.
  • 3.
  • 4.
  • 5.
  • 6. Breast cancer data - difficulties Breast cancer outcome is hard to predict: Large degree of class-overlap in Breast cancer microarray data, whereas Leukemia decision boundaries are easy to find (Blazadonakis, 2009). Van‘t Veer et al. Alon et al. Golub et al.
  • 7.
  • 8. Methods overview Methods overview: ArrayMining & TopoGSA
  • 9. Web-tool: ArrayMining.net What is ArrayMining.net? ArrayMinining.net is an online microarray analysis tool set integrating multiple data sources and algorithms. 6 analysis modules: 1. Gene selection 2. Sample clustering 3. Sample classification 4. Gene Set Analysis 5. Gene Network Analysis 6. Cross-Study Normalization Goal : A “swiss knife“ for microarray analysis tasks classical new www.arraymining.org
  • 10. Methods overview Methods overview: ArrayMining & TopoGSA
  • 11.
  • 12.
  • 13. ArrayMining.net: Examples Further examples: Gene selection and Clustering module Automatic generation of heatmaps and PCA Cluster plots (Armstrong et al. dataset) samples genes
  • 14. ArrayMining.net: Examples Further examples: 3D-ICA and Co-Expression analysis 3D Independent Component Analysis plot (left) and the largest connected components from a gene co-expression network (right) for the Armstrong et al. dataset Sample space: Gene space: ALL AML MLL
  • 15. ArrayMining.net: In-house data Heat map: 50 most significant genes Box plot: 4 most significant genes Apply the tools on new data: QMC Breast cancer data Expression levels across 3 tumour grades: STK6 MYBL2 KIF2C AURKb
  • 16.
  • 17. Methods overview Methods overview: ArrayMining & TopoGSA
  • 18.
  • 19. ArrayMining.net: Consensus clustering ArrayMining‘s consensus clustering approach: Clustering Agreement := No. of times pairs of samples are assigned to the same cluster across all input clusterings Idea: Reward objects in the same cluster, if they have a high agreement. Agreement matrix: A ij := # agreements across all clusterings for samples i and j Fitness function:  := (max(A)+min(A))/2 Sample 1 Sample 2
  • 20.
  • 21.
  • 22.
  • 23. External validation Random model Single clustering Consensus Measure similarity of clusterings with the rand index R : a, b, c and d are the #pairs of objects assigned to: - the same cluster in both clusterings (a) - different clusters in both clusterings (b) - the same cluster in clustering 1/2 and different clusters in clustering 2/1 (c/d) - Corrected for chance:  adjusted rand index Reference clustering: 3 tumour grades (low, medium, high) Clustering results – external validation (tumour grades) 10000 random clusterings
  • 24. Methods overview Methods overview: ArrayMining & TopoGSA
  • 25.
  • 26.
  • 27.
  • 28. External validation Single clustering Consensus clustering Consensus (PAM+SOTA) 10000 random clusterings
  • 29.
  • 30. Methods overview Methods overview: ArrayMining & TopoGSA
  • 31. TopoGSA TopoGSA : Network topological analysis of gene sets What is TopoGSA ? TopoGSA is a web-application mapping gene sets onto a comprehensive human protein interaction network and analysing their network topological properties. Two types of analysis: 1. Compare genes within a gene set: e.g. up- vs. down-regulated genes 2. Compare a gene set against a database of known gene sets (e.g. KEGG, BioCarta, GO) www.infobiotics.net/ TopoGSA
  • 32.
  • 33.
  • 34.
  • 35.
  • 36. RERG Protein Expression VS BCSS & DMFI Kaplan Meier plot of RERG protein expression with respect to BCSS in ER + U ER - cohort Kaplan Meier plot of RERG protein expression with respect to BCSS in ER + only Without adjuvant treatment Without Tamoxifen treatment
  • 37. Conclusions(I): Feature comparison with similar tools ArrayMining & TopoGSA GEPAS (Tarraga et al.) Expression Profiler (Kapushesky et al.) Pre-processing : Image analysis, single- and dimensionality reduction, gene name normalization, cross-study normalization , covariance-based filtering Pre-processing : Image analysis, missing value imputation, multiple single study normalization methods, dimensionality reduction, ID converter Pre-processing : Image analysis, single study normalization, missing value imputation, dimensionality reduction, advanced data selection Analysis : Classification, Clustering, Gene selection, GSEA, PCA, ICA, Co-expression analysis, PPI-topology analysis, Ensembles/Cons. Analysis : Classification, Clustering, Gene selection, GSEA, PCA, CGH arrays, Tissue mining,Text mining, TF-binding site prediction Analysis : Clustering, Gene selection, PCA, Co-expression analysis (different from ArrayMining), COA, Similarity search Usability/features : PDF-reports, sortable ranking tables, data anno-tation, 2D/ 3D plots , e-mail notification, video tutorials Usability/features : special tree visualization (Caat, SotaTree, Newick Trees), 2D plots, data annotation (Babelomics), Usability/features : Excel export, XML queries, 2D plots, data annotation (GO, chromosome location)
  • 38.
  • 39.
  • 40. Pathway enlargment – added genes Example case: BioCarta BTG family proteins and cell cycle regulation Black: Original pathway nodes – Green : Nodes added based on connectivity Added cancer gene
  • 41.
  • 42.
  • 43.
  • 44.

Notes de l'éditeur

  1. Now we combine the Class Discovery analysis with the Gene Set Analysis module, discussed on the next slides.