SlideShare une entreprise Scribd logo
1  sur  24
Theme: Transcriptional Program in the Response of Human
Fibroblasts to Serum.
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
Etienne.gnimpieba@usd.edu
Data manipulation Gene expression data analysis
OMIC World
DNA
E
DNA
mRNA
E
Degradation
Degradation
Translation
Transcription
Gene
Repression
S P
Catalyse
Genomics
Functional
Genomics
Transcriptomics
Proteomics
Metabolomics
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
Data manipulation Gene expression data analysis
OMIC World
GENOMICS
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
Data Manipulation Gene Expression Data Analysis
OMIC World
Genomics is the sub discipline of genetics devoted to the
mapping,
sequencing ,
and functional
analysis of genomics
Genomics can be said to have appeared in the 1980s, and took off in the 1990s
with the initiation of genome projects for several biological species.
The most important tools here are microarrays and bioinformatics
DNA microarrays allow for rapid measurement and visualization of differential
expression between genes at the whole genome scale. If technique implementation is
quite complicated, it’s principle is very easy. Here are described the major steps
involved in this process
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
Data Manipulation Gene Expression Data Analysis
Process
Biological question
Differentially expressed genes
Sample class prediction etc.
Testing
Biological verification
and interpretation
Microarray experiment
Estimation
Experimental design
Image analysis
Normalization
Clustering Discrimination
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
Data Manipulation Gene Expression Data Analysis
Process
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
Data Manipulation Gene Expression Data Analysis
Microarray Production Process
High density
filters(macroarrays)
Glass slides (microarrays) Oligonucleotides chips
Detail: Detail: Detail:
Size: 12cm x 8cm Size: 5,4cm x 0,9cm Size: 1,28cm x 1,28cm
•2400 clones by membrane
•radioactive labelling
•1 experimental condition by
membrane
•10000 clones by slide
•fluorescent labelling
•2 experimental conditions
by slide
•300000 oligonucleotides by
slide
•fluorescent labelling
•1 experimental condition by
slide
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
Data Manipulation Gene Expression Data Analysis
Microarray Production Process
• Frouin, V. & Gidrol, X. (2005)
• CBB group (Berlin)
• Transcriptome ENS (France) Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
Expression Profile Clustering:
Slide Scanning:
Target Preparation:
Hybridization:
Data Manipulation Gene Expression Data Analysis
Microarray Production Process
• Frouin, V. & Gidrol, X. (2005)
• CBB group (Berlin)
• Transcriptome ENS (France)
• Image analysis (genepix)
• Normalization (R)
• Pre-treatment
• Differential expression
• Clustering
• Data mining
• Annotation
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
Data Manipulation Gene Expression Data Analysis
Excel Used in Genomics
• Frouin, V. & Gidrol, X. (2005)
• CBB group (Berlin)
• Transcriptome ENS (France)
• How to select columns
• How to use functions
• How to anchor a cell value in a function
• How to copy the function result and not the
function itself
• How to sort data by columns
• How to search and replace
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
Data Manipulation Gene Expression Data Analysis
Excel Used in Genomics: Pre-Treatment
• Frouin, V. & Gidrol, X. (2005)
• CBB group (Berlin)
• Transcriptome ENS (France)
1. Open the file containing the experiment series (your expression matrix)
in Excel software, using the tabulation character as the column separator.
2. For one column (corresponding to one DNA microarray experiment),
calculate the mean value, using the MEAN Excel function. Verify that the
value obtained is equal to zero.
3. If it is not the case, remove from each experiment log2(Ratio) value the
corresponding mean value. Be careful, for missing values (empty cells),
replace empty contents by the NULL or NA string, in order to avoid
introducing a zero value in Excel calculation in this cell. Indeed, a
missing value is different from a true null one!
4. Once this operation has been done, verify that the final mean value is
equal to zero, this in order to avoid errors with Excel handling. Be careful,
with decimal separator handling in Excel version (dot or coma)!
Centering and Scaling Data
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
Data Manipulation Gene Expression Data Analysis
Excel Used in Genomics : Differential Expression Analysis (1)
• Frouin, V. & Gidrol, X. (2005)
• CBB group (Berlin)
• Transcriptome ENS (France)
Significance Analysis of Microarrays (SAM):
SAM is an Excel macro freely available for academics on the web. The use of SAM in Excel spreadsheet
makes this tool easier to use for most of microarray users. Using SAM implies several modifications in
your data file:
 The ratio or intensity values in the Excel sheet must not contain any comas but only points as
decimal separator.
 The header line depends on the type of analysis you want to perform. You can refer to SAM
manual for more information. So you must duplicate your header if you don’t want to loose the
experiment information (see image below).
 Two annotation columns are available. SAM always references its calculation to the line number
in the departure sheet.
SAM (Significance Analysis of Microarray), Excel macro allowing to search for differentially expressed
genes using a bootstrapping method. Website: http://www-stat.stanford.edu/~tibs/SAM/
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
Data Manipulation Gene Expression Data Analysis
Excel Used in Genomics : Differential Expression Analysis (2)
• Frouin, V. & Gidrol, X. (2005)
• CBB group (Berlin)
• Transcriptome ENS (France)
 When the SAM macro is launched in the tool bar (“SAM”), a setting window appears. For further
information on the various options you can choose, the best is to refer to the SAM manual. However,
the first important things to do is to indicate if the data source has been transformed in log2 or not,
then, as data bootstrapping uses a random generator, you need to initialize it several times by
creating a various number of seeds.
 Once all the chosen iterations have been done, SAM displays a plot representing each gene thanks to
its score in the real distribution compared to the random distributions. Therefore, the differentially
expressed genes are the ones moving away from the 45° slope line.
 First, display the delta table. This table indicates for each delta value, the number of putative
differentially expressed genes, the significant genes, and the number of false positive genes
estimated using the False Discovery Rate (FDR). The user fixes the delta value according to the
number of false positive or significant genes he wants to obtain.
 To choose the delta value, get back to the SAM plot sheet and display the “SAM plot controller” by
clicking on the SAM macro button.
 The SAM Plot Controller window lets you fix the delta value you want: “Manually Enter Delta”. Then if
you select the “List Significant Genes” button, SAM displays the list of differentially expressed genes
in the “SAM output” sheet according to the delta value you chose.
 This sheet summarizes the selected parameters and gives you the list of induced and repressed
genes.
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
Data Manipulation Gene Expression Data Analysis
GEPAS: Gene Expression Pattern Analysis Suite
• Frouin, V. & Gidrol, X. (2005)
• CBB group (Berlin)
• Transcriptome ENS (France)
 Verify the availability of the data file in your folder name
FibroGEPAS.txt
 Open the dataset for description
 Open GEPAS portal on
http://www.transcriptome.ens.fr/gepas/index.html
 Click on “Tools”
 Preprocessing
- Preprocess DNA array data files: log-transformation,
replicate handling, missing value imputation, filtering and
normalization
- Filtering
 Viewing
 Clustering
 Differential expression
 Classification
 Data mining
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
Microarray Dataset: Mining and Gene
Profile Analysis using online Tools
Kruer Lab
• Gene Expression Measurement
• Microarray Process
• Gene Expression Data Stores
• Data Mining / Querying
• Data Analysis
• Example: ATP13A2 Profile in Stress
Conditions
Gene Expression Measurement
 Gene
expression
technologies
 Microarray
process
 Gene expression
data stores
 Data mining /
quering (pb-
query-
extraction-load-
store-pretreat)
 Data analysis
(Question-
Answer,
descriptive,
predictive,
modeling)
 Example:
ATP13A2 profile
in stress
conditions
Higher-plex techniques:
SAGE
DNA microarray
Tiling array
RNA-Seq
NGS
Low-to-mid-plex techniques:
Reporter gene
Northern blot
Western blot
Fluorescent in situ hybridization
Reverse transcription PCR
Database
Microarray
Experiment
Sets
Sample
Profiles
Date Reported
ArrayExpress at EBI 24,838 708,914 October 28, 2011
ArrayTrack™ 1,622 50,953 February 11, 2012
caArray at NCI 41 1,741 November 15, 2006
Gene Expression Omnibus - NCBI 25,859 641,770 October 28, 2011
Genevestigator database 2,500 65,000 January 2012
MUSC database ~45 555 April 1, 2007
Stanford Microarray database 82,542 Not reported October 23, 2011
UNC Microarray database ~31 2,093 April 1, 2007
UNC modENCODE Microarray
database
~6 180 July 17, 2009
UPenn RAD database ~100 ~2,500 September 1, 2007
UPSC-BASE ~100 Not reported November 15, 2007
SAGE GEO
GUDMAP (421) MGI
BIOGPS
 Gene
expression
technologies
 Microarray
process
 Gene expression
data stores
 Data mining /
quering (pb-
query-
extraction-load-
store-pretreat)
 Data analysis
(Question-
Answer,
descriptive,
predictive,
modeling)
 Example:
ATP13A2 profile
in stress
conditions
Gene Expression Measurement
Data Mining / Querying
• Problem specification
• Query
• Extraction
• Storage
• Load
• Pretreat / prepare for analysis
 Gene
expression
technologies
 Microarray
process
 Gene expression
data stores
 Data mining /
quering (pb-
query-
extraction-load-
store-pretreat)
 Data analysis
(Question-
Answer,
descriptive,
predictive,
modeling)
 Example:
ATP13A2 profile
in stress
conditions
Data Analysis
• Question-Answer
– Experimental condition profile: group comparison
– Annotation profile: systems biological involved
– Clustering profile: co-regulation
– Time course profile: time variation
– …
• Descriptive
– Boxplot (SD, MEAN, MEDIAN, )
– Scatter plot
• Predictive / inference (clustering)
• Modeling (machine learning, simulation)
 Gene
expression
technologies
 Microarray
process
 Gene expression
data stores
 Data mining /
quering (pb-
query-
extraction-load-
store-pretreat)
 Data analysis
(Question-
Answer,
descriptive,
predictive,
modeling)
 Example:
ATP13A2 profile
in stress
conditions
• 3 Questions
– What is the right dataset (experimental
condition)?
– Is dataset is ready for analysis (quality)?
– What is the expression profile for a given gene?
– Significant differential expression in groups
comparison
• Tools
– ArrayExpress (EBI)
– Boxplot
– GEO2R (LIMMA, profile graph,)
 Gene
expression
technologies
 Microarray
process
 Gene expression
data stores
 Data mining /
quering (pb-
query-
extraction-load-
store-pretreat)
 Data analysis
(Question-
Answer,
descriptive,
predictive,
modeling)
 Example:
ATP13A2 profile
in stress
conditions
Data Analysis
Boxplot
 Gene
expression
technologies
 Microarray
process
 Gene expression
data stores
 Data mining /
quering (pb-
query-
extraction-load-
store-pretreat)
 Data analysis
(Question-
Answer,
descriptive,
predictive,
modeling)
 Example:
ATP13A2 profile
in stress
conditions
Data Analysis
Example: ATP13A2 Profile
in Stress Conditions
• Specification: ATP13A2 profile in stress
conditions
• Data querying:
– GEO
– Array Express
– Gene Atlas
• Data analysis:
– Online: GEO2R, Genospace, …
– Desktop: R, ArrayTrack, …
 Gene
expression
technologies
 Microarray
process
 Gene expression
data stores
 Data mining /
quering (pb-
query-
extraction-load-
store-pretreat)
 Data analysis
(Question-
Answer,
descriptive,
predictive,
modeling)
 Example:
ATP13A2 profile
in stress
conditions
Resolution Process
Context
Specification & Aims
Lab #2
 Preprocessing
 Viewing
 Clustering
 Differential expression
 Classification
 Data mining
24
Statement of problem / Case study:
The temporal program of gene expression during a model physiological response of human cells, the response of fibroblasts to serum, was explored with a
complementary DNA microarray representing about 8600 different human genes. Genes could be clustered into groups on the basis of their temporal patterns of expression in
this program. Many features of the transcriptional program appeared to be related to the physiology of wound repair, suggesting that fibroblasts play a larger and richer role in
this complex multicellular response than had previously been appreciated.
Gene Expression Data Analysis
16 Vishwanath R. Iyer, Scince, 1999
Conclusion: ?
Aim:
The purpose of this lab is to initiate on gene expression data analysis process.
We simulated the application on “Transcriptional Program in the Response of
Human Fibroblasts to Serum” . Now we can understand how a researcher can
come to identify a significant expressed gene from microarray dataset.
T1. Gene expression overview
T2. Excel used in Genomics
Objective: used of basic excel functionalities to solve some gene
expression data analysis needs
Acquired skills
- Gene expression data overview
- Excel Used for genomics
- Microarray data analysis using GEPAS
T1.1. Review of genomics place in OMIC- world
T1.2. Microarray data technics and process
T1.3. Data analysis cycle and tools
T2.1. Colum manipulation, functions used, anchor, copy with
function, sort data, search and replace
T2.2. Experiment comparison: Data pre-treatment
T1.3. Differential expressed gene from replicate experiments (SAM)
T2. GEPAS: Gene expression analysis pattern suite
Objective: used of the GEPAS suite to apply the whole microarray data
analyzing process on fibroblast data.
http://www.transcriptome.ens.fr/gepas/index.html
Expression Profile Clustering:
Slide Scanning:
Target Preparation:
Hybridization:

Contenu connexe

En vedette

En vedette (11)

Bridge Amplification Part 2
Bridge Amplification Part 2Bridge Amplification Part 2
Bridge Amplification Part 2
 
True Single Molecule Sequencing
True Single Molecule SequencingTrue Single Molecule Sequencing
True Single Molecule Sequencing
 
Next Generation Sequencing - the basics
Next Generation Sequencing - the basicsNext Generation Sequencing - the basics
Next Generation Sequencing - the basics
 
Ion Torrent Sequencing
Ion Torrent SequencingIon Torrent Sequencing
Ion Torrent Sequencing
 
Improved Reagents & Methods for Target Enrichment in Next Generation Sequencing
Improved Reagents & Methods for Target Enrichment in Next Generation SequencingImproved Reagents & Methods for Target Enrichment in Next Generation Sequencing
Improved Reagents & Methods for Target Enrichment in Next Generation Sequencing
 
Bridge Amplification Part 1
Bridge Amplification Part 1Bridge Amplification Part 1
Bridge Amplification Part 1
 
Sanger Dideoxy Method
Sanger Dideoxy MethodSanger Dideoxy Method
Sanger Dideoxy Method
 
Small Molecule Real Time Sequencing
Small Molecule Real Time SequencingSmall Molecule Real Time Sequencing
Small Molecule Real Time Sequencing
 
Pyrosequencing 454
Pyrosequencing 454Pyrosequencing 454
Pyrosequencing 454
 
Illumina Sequencing
Illumina SequencingIllumina Sequencing
Illumina Sequencing
 
Basic Steps of the NGS Method
Basic Steps of the NGS MethodBasic Steps of the NGS Method
Basic Steps of the NGS Method
 

Similaire à Session ii g3 overview behavior science mmc

Lab Gene Expression Data Analysis
Lab Gene Expression Data AnalysisLab Gene Expression Data Analysis
Lab Gene Expression Data AnalysisUSD Bioinformatics
 
Variation and the VEP: Ensembl Online Webinar series
Variation and the VEP: Ensembl Online Webinar seriesVariation and the VEP: Ensembl Online Webinar series
Variation and the VEP: Ensembl Online Webinar seriesDenise Carvalho-Silva, PhD
 
Sample Work For Engineering Literature Review and Gap Identification
Sample Work For Engineering Literature Review and Gap IdentificationSample Work For Engineering Literature Review and Gap Identification
Sample Work For Engineering Literature Review and Gap IdentificationPhD Assistance
 
Feature selection and microarray data
Feature selection and microarray dataFeature selection and microarray data
Feature selection and microarray dataGianluca Bontempi
 
CSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning ProjectCSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning Projectbutest
 
Final VIPER presentation at BioVis 2013
Final VIPER presentation at BioVis 2013Final VIPER presentation at BioVis 2013
Final VIPER presentation at BioVis 2013martinjgraham
 
Microarray data Analysis.pptx
Microarray data Analysis.pptxMicroarray data Analysis.pptx
Microarray data Analysis.pptxsanarao25
 
Bioinformatics-R program의 실례
Bioinformatics-R program의 실례Bioinformatics-R program의 실례
Bioinformatics-R program의 실례mothersafe
 
Evolution of Knowledge Discovery and Management
Evolution of Knowledge Discovery and Management Evolution of Knowledge Discovery and Management
Evolution of Knowledge Discovery and Management inscit2006
 
GB20 Nodes Training Course 2013, module 5B: Latest trends in data analysis
GB20 Nodes Training Course 2013, module 5B: Latest trends in data analysisGB20 Nodes Training Course 2013, module 5B: Latest trends in data analysis
GB20 Nodes Training Course 2013, module 5B: Latest trends in data analysisDag Endresen
 
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationVisual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationNils Gehlenborg
 
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...ahmad abdelhafeez
 
Classification of Breast Cancer Diseases using Data Mining Techniques
Classification of Breast Cancer Diseases using Data Mining TechniquesClassification of Breast Cancer Diseases using Data Mining Techniques
Classification of Breast Cancer Diseases using Data Mining Techniquesinventionjournals
 
20100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_020100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_0Computer Science Club
 
Session ii g1 lab genomics and gene expression mmc-corr
Session ii g1 lab genomics and gene expression mmc-corrSession ii g1 lab genomics and gene expression mmc-corr
Session ii g1 lab genomics and gene expression mmc-corrUSD Bioinformatics
 
Seminar Slides
Seminar SlidesSeminar Slides
Seminar Slidespannicle
 
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSISSEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSISIRJET Journal
 
Cdac 2018 antoniotti cancer evolution trait
Cdac 2018 antoniotti cancer evolution traitCdac 2018 antoniotti cancer evolution trait
Cdac 2018 antoniotti cancer evolution traitMarco Antoniotti
 

Similaire à Session ii g3 overview behavior science mmc (20)

Lab Gene Expression Data Analysis
Lab Gene Expression Data AnalysisLab Gene Expression Data Analysis
Lab Gene Expression Data Analysis
 
Variation and the VEP: Ensembl Online Webinar series
Variation and the VEP: Ensembl Online Webinar seriesVariation and the VEP: Ensembl Online Webinar series
Variation and the VEP: Ensembl Online Webinar series
 
Sample Work For Engineering Literature Review and Gap Identification
Sample Work For Engineering Literature Review and Gap IdentificationSample Work For Engineering Literature Review and Gap Identification
Sample Work For Engineering Literature Review and Gap Identification
 
Feature selection and microarray data
Feature selection and microarray dataFeature selection and microarray data
Feature selection and microarray data
 
CSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning ProjectCSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning Project
 
Final VIPER presentation at BioVis 2013
Final VIPER presentation at BioVis 2013Final VIPER presentation at BioVis 2013
Final VIPER presentation at BioVis 2013
 
Microarray data Analysis.pptx
Microarray data Analysis.pptxMicroarray data Analysis.pptx
Microarray data Analysis.pptx
 
Bioinformatics-R program의 실례
Bioinformatics-R program의 실례Bioinformatics-R program의 실례
Bioinformatics-R program의 실례
 
Evolution of Knowledge Discovery and Management
Evolution of Knowledge Discovery and Management Evolution of Knowledge Discovery and Management
Evolution of Knowledge Discovery and Management
 
GB20 Nodes Training Course 2013, module 5B: Latest trends in data analysis
GB20 Nodes Training Course 2013, module 5B: Latest trends in data analysisGB20 Nodes Training Course 2013, module 5B: Latest trends in data analysis
GB20 Nodes Training Course 2013, module 5B: Latest trends in data analysis
 
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationVisual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
 
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
 
Classification of Breast Cancer Diseases using Data Mining Techniques
Classification of Breast Cancer Diseases using Data Mining TechniquesClassification of Breast Cancer Diseases using Data Mining Techniques
Classification of Breast Cancer Diseases using Data Mining Techniques
 
2016 bergen-sars
2016 bergen-sars2016 bergen-sars
2016 bergen-sars
 
20100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_020100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_0
 
Session ii g1 lab genomics and gene expression mmc-corr
Session ii g1 lab genomics and gene expression mmc-corrSession ii g1 lab genomics and gene expression mmc-corr
Session ii g1 lab genomics and gene expression mmc-corr
 
Seminar Slides
Seminar SlidesSeminar Slides
Seminar Slides
 
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSISSEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
 
final_presentation
final_presentationfinal_presentation
final_presentation
 
Cdac 2018 antoniotti cancer evolution trait
Cdac 2018 antoniotti cancer evolution traitCdac 2018 antoniotti cancer evolution trait
Cdac 2018 antoniotti cancer evolution trait
 

Plus de USD Bioinformatics

Plus de USD Bioinformatics (10)

Session ii g2 overview chemical modeling mmc
Session ii g2 overview chemical modeling mmcSession ii g2 overview chemical modeling mmc
Session ii g2 overview chemical modeling mmc
 
Session ii g2 lab modeling mmc
Session ii g2 lab modeling mmcSession ii g2 lab modeling mmc
Session ii g2 lab modeling mmc
 
Session ii g1 overview genomics and gene expression mmc-good
Session ii g1 overview genomics and gene expression mmc-goodSession ii g1 overview genomics and gene expression mmc-good
Session ii g1 overview genomics and gene expression mmc-good
 
Session i overview bioinfo dm and app mmc
Session i overview bioinfo dm and app mmcSession i overview bioinfo dm and app mmc
Session i overview bioinfo dm and app mmc
 
Session i lab bioinfo dm and app mmc
Session i lab bioinfo dm and app mmcSession i lab bioinfo dm and app mmc
Session i lab bioinfo dm and app mmc
 
Swiss model evaluation
Swiss model evaluationSwiss model evaluation
Swiss model evaluation
 
Amino acid sequence
Amino acid sequenceAmino acid sequence
Amino acid sequence
 
Brin bws13 quiz mmc
Brin bws13 quiz mmcBrin bws13 quiz mmc
Brin bws13 quiz mmc
 
Brin annimation
Brin annimationBrin annimation
Brin annimation
 
Huber brin pb1_f2_poster_2012
Huber brin pb1_f2_poster_2012Huber brin pb1_f2_poster_2012
Huber brin pb1_f2_poster_2012
 

Dernier

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 

Dernier (20)

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 

Session ii g3 overview behavior science mmc

  • 1. Theme: Transcriptional Program in the Response of Human Fibroblasts to Serum. Etienne Z. Gnimpieba BRIN WS 2013 Mount Marty College – June 24th 2013 Etienne.gnimpieba@usd.edu
  • 2. Data manipulation Gene expression data analysis OMIC World DNA E DNA mRNA E Degradation Degradation Translation Transcription Gene Repression S P Catalyse Genomics Functional Genomics Transcriptomics Proteomics Metabolomics Etienne Z. Gnimpieba BRIN WS 2013 Mount Marty College – June 24th 2013
  • 3. Data manipulation Gene expression data analysis OMIC World GENOMICS Etienne Z. Gnimpieba BRIN WS 2013 Mount Marty College – June 24th 2013
  • 4. Data Manipulation Gene Expression Data Analysis OMIC World Genomics is the sub discipline of genetics devoted to the mapping, sequencing , and functional analysis of genomics Genomics can be said to have appeared in the 1980s, and took off in the 1990s with the initiation of genome projects for several biological species. The most important tools here are microarrays and bioinformatics DNA microarrays allow for rapid measurement and visualization of differential expression between genes at the whole genome scale. If technique implementation is quite complicated, it’s principle is very easy. Here are described the major steps involved in this process Etienne Z. Gnimpieba BRIN WS 2013 Mount Marty College – June 24th 2013
  • 5. Data Manipulation Gene Expression Data Analysis Process Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment Estimation Experimental design Image analysis Normalization Clustering Discrimination Etienne Z. Gnimpieba BRIN WS 2013 Mount Marty College – June 24th 2013
  • 6. Data Manipulation Gene Expression Data Analysis Process Etienne Z. Gnimpieba BRIN WS 2013 Mount Marty College – June 24th 2013
  • 7. Data Manipulation Gene Expression Data Analysis Microarray Production Process High density filters(macroarrays) Glass slides (microarrays) Oligonucleotides chips Detail: Detail: Detail: Size: 12cm x 8cm Size: 5,4cm x 0,9cm Size: 1,28cm x 1,28cm •2400 clones by membrane •radioactive labelling •1 experimental condition by membrane •10000 clones by slide •fluorescent labelling •2 experimental conditions by slide •300000 oligonucleotides by slide •fluorescent labelling •1 experimental condition by slide Etienne Z. Gnimpieba BRIN WS 2013 Mount Marty College – June 24th 2013
  • 8. Data Manipulation Gene Expression Data Analysis Microarray Production Process • Frouin, V. & Gidrol, X. (2005) • CBB group (Berlin) • Transcriptome ENS (France) Etienne Z. Gnimpieba BRIN WS 2013 Mount Marty College – June 24th 2013 Expression Profile Clustering: Slide Scanning: Target Preparation: Hybridization:
  • 9. Data Manipulation Gene Expression Data Analysis Microarray Production Process • Frouin, V. & Gidrol, X. (2005) • CBB group (Berlin) • Transcriptome ENS (France) • Image analysis (genepix) • Normalization (R) • Pre-treatment • Differential expression • Clustering • Data mining • Annotation Etienne Z. Gnimpieba BRIN WS 2013 Mount Marty College – June 24th 2013
  • 10. Data Manipulation Gene Expression Data Analysis Excel Used in Genomics • Frouin, V. & Gidrol, X. (2005) • CBB group (Berlin) • Transcriptome ENS (France) • How to select columns • How to use functions • How to anchor a cell value in a function • How to copy the function result and not the function itself • How to sort data by columns • How to search and replace Etienne Z. Gnimpieba BRIN WS 2013 Mount Marty College – June 24th 2013
  • 11. Data Manipulation Gene Expression Data Analysis Excel Used in Genomics: Pre-Treatment • Frouin, V. & Gidrol, X. (2005) • CBB group (Berlin) • Transcriptome ENS (France) 1. Open the file containing the experiment series (your expression matrix) in Excel software, using the tabulation character as the column separator. 2. For one column (corresponding to one DNA microarray experiment), calculate the mean value, using the MEAN Excel function. Verify that the value obtained is equal to zero. 3. If it is not the case, remove from each experiment log2(Ratio) value the corresponding mean value. Be careful, for missing values (empty cells), replace empty contents by the NULL or NA string, in order to avoid introducing a zero value in Excel calculation in this cell. Indeed, a missing value is different from a true null one! 4. Once this operation has been done, verify that the final mean value is equal to zero, this in order to avoid errors with Excel handling. Be careful, with decimal separator handling in Excel version (dot or coma)! Centering and Scaling Data Etienne Z. Gnimpieba BRIN WS 2013 Mount Marty College – June 24th 2013
  • 12. Data Manipulation Gene Expression Data Analysis Excel Used in Genomics : Differential Expression Analysis (1) • Frouin, V. & Gidrol, X. (2005) • CBB group (Berlin) • Transcriptome ENS (France) Significance Analysis of Microarrays (SAM): SAM is an Excel macro freely available for academics on the web. The use of SAM in Excel spreadsheet makes this tool easier to use for most of microarray users. Using SAM implies several modifications in your data file:  The ratio or intensity values in the Excel sheet must not contain any comas but only points as decimal separator.  The header line depends on the type of analysis you want to perform. You can refer to SAM manual for more information. So you must duplicate your header if you don’t want to loose the experiment information (see image below).  Two annotation columns are available. SAM always references its calculation to the line number in the departure sheet. SAM (Significance Analysis of Microarray), Excel macro allowing to search for differentially expressed genes using a bootstrapping method. Website: http://www-stat.stanford.edu/~tibs/SAM/ Etienne Z. Gnimpieba BRIN WS 2013 Mount Marty College – June 24th 2013
  • 13. Data Manipulation Gene Expression Data Analysis Excel Used in Genomics : Differential Expression Analysis (2) • Frouin, V. & Gidrol, X. (2005) • CBB group (Berlin) • Transcriptome ENS (France)  When the SAM macro is launched in the tool bar (“SAM”), a setting window appears. For further information on the various options you can choose, the best is to refer to the SAM manual. However, the first important things to do is to indicate if the data source has been transformed in log2 or not, then, as data bootstrapping uses a random generator, you need to initialize it several times by creating a various number of seeds.  Once all the chosen iterations have been done, SAM displays a plot representing each gene thanks to its score in the real distribution compared to the random distributions. Therefore, the differentially expressed genes are the ones moving away from the 45° slope line.  First, display the delta table. This table indicates for each delta value, the number of putative differentially expressed genes, the significant genes, and the number of false positive genes estimated using the False Discovery Rate (FDR). The user fixes the delta value according to the number of false positive or significant genes he wants to obtain.  To choose the delta value, get back to the SAM plot sheet and display the “SAM plot controller” by clicking on the SAM macro button.  The SAM Plot Controller window lets you fix the delta value you want: “Manually Enter Delta”. Then if you select the “List Significant Genes” button, SAM displays the list of differentially expressed genes in the “SAM output” sheet according to the delta value you chose.  This sheet summarizes the selected parameters and gives you the list of induced and repressed genes. Etienne Z. Gnimpieba BRIN WS 2013 Mount Marty College – June 24th 2013
  • 14. Data Manipulation Gene Expression Data Analysis GEPAS: Gene Expression Pattern Analysis Suite • Frouin, V. & Gidrol, X. (2005) • CBB group (Berlin) • Transcriptome ENS (France)  Verify the availability of the data file in your folder name FibroGEPAS.txt  Open the dataset for description  Open GEPAS portal on http://www.transcriptome.ens.fr/gepas/index.html  Click on “Tools”  Preprocessing - Preprocess DNA array data files: log-transformation, replicate handling, missing value imputation, filtering and normalization - Filtering  Viewing  Clustering  Differential expression  Classification  Data mining Etienne Z. Gnimpieba BRIN WS 2013 Mount Marty College – June 24th 2013
  • 15. Microarray Dataset: Mining and Gene Profile Analysis using online Tools Kruer Lab
  • 16. • Gene Expression Measurement • Microarray Process • Gene Expression Data Stores • Data Mining / Querying • Data Analysis • Example: ATP13A2 Profile in Stress Conditions
  • 17. Gene Expression Measurement  Gene expression technologies  Microarray process  Gene expression data stores  Data mining / quering (pb- query- extraction-load- store-pretreat)  Data analysis (Question- Answer, descriptive, predictive, modeling)  Example: ATP13A2 profile in stress conditions Higher-plex techniques: SAGE DNA microarray Tiling array RNA-Seq NGS Low-to-mid-plex techniques: Reporter gene Northern blot Western blot Fluorescent in situ hybridization Reverse transcription PCR
  • 18. Database Microarray Experiment Sets Sample Profiles Date Reported ArrayExpress at EBI 24,838 708,914 October 28, 2011 ArrayTrack™ 1,622 50,953 February 11, 2012 caArray at NCI 41 1,741 November 15, 2006 Gene Expression Omnibus - NCBI 25,859 641,770 October 28, 2011 Genevestigator database 2,500 65,000 January 2012 MUSC database ~45 555 April 1, 2007 Stanford Microarray database 82,542 Not reported October 23, 2011 UNC Microarray database ~31 2,093 April 1, 2007 UNC modENCODE Microarray database ~6 180 July 17, 2009 UPenn RAD database ~100 ~2,500 September 1, 2007 UPSC-BASE ~100 Not reported November 15, 2007 SAGE GEO GUDMAP (421) MGI BIOGPS  Gene expression technologies  Microarray process  Gene expression data stores  Data mining / quering (pb- query- extraction-load- store-pretreat)  Data analysis (Question- Answer, descriptive, predictive, modeling)  Example: ATP13A2 profile in stress conditions Gene Expression Measurement
  • 19. Data Mining / Querying • Problem specification • Query • Extraction • Storage • Load • Pretreat / prepare for analysis  Gene expression technologies  Microarray process  Gene expression data stores  Data mining / quering (pb- query- extraction-load- store-pretreat)  Data analysis (Question- Answer, descriptive, predictive, modeling)  Example: ATP13A2 profile in stress conditions
  • 20. Data Analysis • Question-Answer – Experimental condition profile: group comparison – Annotation profile: systems biological involved – Clustering profile: co-regulation – Time course profile: time variation – … • Descriptive – Boxplot (SD, MEAN, MEDIAN, ) – Scatter plot • Predictive / inference (clustering) • Modeling (machine learning, simulation)  Gene expression technologies  Microarray process  Gene expression data stores  Data mining / quering (pb- query- extraction-load- store-pretreat)  Data analysis (Question- Answer, descriptive, predictive, modeling)  Example: ATP13A2 profile in stress conditions
  • 21. • 3 Questions – What is the right dataset (experimental condition)? – Is dataset is ready for analysis (quality)? – What is the expression profile for a given gene? – Significant differential expression in groups comparison • Tools – ArrayExpress (EBI) – Boxplot – GEO2R (LIMMA, profile graph,)  Gene expression technologies  Microarray process  Gene expression data stores  Data mining / quering (pb- query- extraction-load- store-pretreat)  Data analysis (Question- Answer, descriptive, predictive, modeling)  Example: ATP13A2 profile in stress conditions Data Analysis
  • 22. Boxplot  Gene expression technologies  Microarray process  Gene expression data stores  Data mining / quering (pb- query- extraction-load- store-pretreat)  Data analysis (Question- Answer, descriptive, predictive, modeling)  Example: ATP13A2 profile in stress conditions Data Analysis
  • 23. Example: ATP13A2 Profile in Stress Conditions • Specification: ATP13A2 profile in stress conditions • Data querying: – GEO – Array Express – Gene Atlas • Data analysis: – Online: GEO2R, Genospace, … – Desktop: R, ArrayTrack, …  Gene expression technologies  Microarray process  Gene expression data stores  Data mining / quering (pb- query- extraction-load- store-pretreat)  Data analysis (Question- Answer, descriptive, predictive, modeling)  Example: ATP13A2 profile in stress conditions
  • 24. Resolution Process Context Specification & Aims Lab #2  Preprocessing  Viewing  Clustering  Differential expression  Classification  Data mining 24 Statement of problem / Case study: The temporal program of gene expression during a model physiological response of human cells, the response of fibroblasts to serum, was explored with a complementary DNA microarray representing about 8600 different human genes. Genes could be clustered into groups on the basis of their temporal patterns of expression in this program. Many features of the transcriptional program appeared to be related to the physiology of wound repair, suggesting that fibroblasts play a larger and richer role in this complex multicellular response than had previously been appreciated. Gene Expression Data Analysis 16 Vishwanath R. Iyer, Scince, 1999 Conclusion: ? Aim: The purpose of this lab is to initiate on gene expression data analysis process. We simulated the application on “Transcriptional Program in the Response of Human Fibroblasts to Serum” . Now we can understand how a researcher can come to identify a significant expressed gene from microarray dataset. T1. Gene expression overview T2. Excel used in Genomics Objective: used of basic excel functionalities to solve some gene expression data analysis needs Acquired skills - Gene expression data overview - Excel Used for genomics - Microarray data analysis using GEPAS T1.1. Review of genomics place in OMIC- world T1.2. Microarray data technics and process T1.3. Data analysis cycle and tools T2.1. Colum manipulation, functions used, anchor, copy with function, sort data, search and replace T2.2. Experiment comparison: Data pre-treatment T1.3. Differential expressed gene from replicate experiments (SAM) T2. GEPAS: Gene expression analysis pattern suite Objective: used of the GEPAS suite to apply the whole microarray data analyzing process on fibroblast data. http://www.transcriptome.ens.fr/gepas/index.html Expression Profile Clustering: Slide Scanning: Target Preparation: Hybridization:

Notes de l'éditeur

  1. During this lab, we have:A brief review Lab’s templateGenome exploration practice…
  2. DNA fragments amplified by PCR technique are spotted on a microscopic glass slide coated with polylysine prior to spotting process. The polylysine coating goal is to ensure DNA fixation through electrostatic interactions. PCR fragments are in our case the expressed part (ORF) of the 6200 Saccharomyces cerevisae genes (baker yeast). Slide preparation is achieved by blocking the polylysine not fixed to DNA in order to avoid target binding. Prior to hybridisation, DNA is denatured to obtained a single strand DNA on the microarray, this will allow the probe to bind to the complementary strand from the target. Apart from glass slide microarray other types of chips exist
  3. Target preparation:RNA are extracted from two yeast cultures from which we want to compare expression level. Messengers RNA are then transformed in cDNA by reverse transcription. On this stage, DNA from the first culture with a green dye, whereas DNA from the second culture is labelled with a red dye.Hybridisation:Green labelledcDNA and red labelled ones are mixed together (call the target) and put on the matrix of spotted single strand DNA (call the probe). The chip is then incubated one night at 60 degrees. At this temperature, a DNA strand that encounter the complementary strand and match together to create a double strand DNA. The fluorescent DNA will then hybridise on the spotted onesSlide scanning:A laser excites each spot and the fluorescent emission gather through a photo-multiplicator (PMT) coupled to a confocal microscope. We obtained two images where grey scales represent fluorescent intensities read. If we replace grey scales by green scales for the first image and red scales for the second one, we obtained by superimposing the two images one image composed of spots going from green ones (where only DNA from the first condition is fixed) to red (where only DNA from the second condition is fixed) passing through the yellow colour (where DNA from the two conditions are fixed on equal amount).Data analysis:We have now two microarray images from which we have to calculate the number of DNA molecules in each experimental condition. To dos o, we measure the signal amount in the green dye emission wavelength and the signal amount in the red dye emission wavelength. Then we normalise these amount according to various parameters (yeast amount in each culture condition, emission power of each dye, …). We suppose that the amount of fluorescent DNA fixed is proportional to the mRNA amount present in each cell at the beginning and we calculate the red/green fluorescence ratio. If this ratio is greater than 1 (red on the image), the gene expression is greater in the second experimental condition, if this ration is smaller than 1 (green on the image), the gene expression is greater in the first condition. We can visualize these differences in expression using software as the one developed in the laboratory call ArrayPlot (cf below image). This software allows from the intensities list of spot to display the red intensities of each spot as a function of the green intensities.Expression profile clustering:Then we can try to gather genes that share the same expression profile on several experiments. This clustering can be done gradually as for phylogenetic analysis, which consist in calculating similarity criteria between expression profiles and gather the most similar ones. We can also use more complex techniques as principal component analysis or neuronal networks.At the end hierarchical clustering is usually displayed as a matrix where each column represent one experiment and each row a gene. Ratios are displayed thanks to a colour scale going from green (repressed genes) to red (induced genes).
  4. Once you have your normalized data file, open it with Excel. You can filter out weak intensity spots (eliminate the weakest intensities in both channels) keep spot with ratio greater than 1 or lower than –1. Remember we are working with log2(ratio) so log2(2)=1. This method called “fold change” is the one used at the beginning of microarray analysis and is still useful if you do not have enough replicates to apply statistical treatments.The “fold change” method lack accuracy regarding the significant threshold to be fixed. That’s the reason why it is useful to apply a statistical method able to take into account intensity variations and most of all, the variability among experiments.Significance Analysis of Microarrays (SAM):SAM is an Excel macro freely available for academics on the web. The use of SAM in Excel spreadsheet makes this tool easier to use for most of microarray users. Using SAM implies several modifications in your data file:The ratio or intensity values in the Excel sheet must not contain any comas but only points as decimal separator.The header line depends on the type of analysis you want to perform. You can refer to SAM manual for more information. So you must duplicate your header if you don’t want to loose the experiment information (see image below).Two annotation columns are available. SAM always references its calculation to the line number in the departure sheet.Before launching the macro, it is necessary to select the data precisely because SAM rejects lines with too much missing values (such as empty lines).
  5. Once you have your normalized data file, open it with Excel. You can filter out weak intensity spots (eliminate the weakest intensities in both channels) keep spot with ratio greater than 1 or lower than –1. Remember we are working with log2(ratio) so log2(2)=1. This method called “fold change” is the one used at the beginning of microarray analysis and is still useful if you do not have enough replicates to apply statistical treatments.The “fold change” method lack accuracy regarding the significant threshold to be fixed. That’s the reason why it is useful to apply a statistical method able to take into account intensity variations and most of all, the variability among experiments.Significance Analysis of Microarrays (SAM):SAM is an Excel macro freely available for academics on the web. The use of SAM in Excel spreadsheet makes this tool easier to use for most of microarray users. Using SAM implies several modifications in your data file:The ratio or intensity values in the Excel sheet must not contain any comas but only points as decimal separator.The header line depends on the type of analysis you want to perform. You can refer to SAM manual for more information. So you must duplicate your header if you don’t want to loose the experiment information (see image below).Two annotation columns are available. SAM always references its calculation to the line number in the departure sheet.Before launching the macro, it is necessary to select the data precisely because SAM rejects lines with too much missing values (such as empty lines).
  6. I can not say that I'm into Statistician 20 min. I give you just a few items to give rapid analysis of microarray.
  7. The following experimental techniques are used to measure gene expression and are listed in roughly chronological order, starting with the older, more established technologies. They are divided into two groups based on their degree of multiplexity.
  8. ArrayTrack™ provides an integrated solution for managing, analyzing, and interpreting microarray gene expression data. Specifically, ArrayTrack™ is MIAME (Minimum Information About A Microarray Experiment)-supportive for storing both microarray data and experiment parameters associated with a pharmacogenomics or toxicogenomics study. Many statistical and visualization tools are available with ArrayTrack™ which provides a rich collection of functional information about genes, proteins, and pathways for biological interpretation.  The primary emphasis of ArrayTrack™ is the direct linking of analysis results with functional information to facilitate the interaction between the choice of analysis methods and the biological relevance of analysis results. Using ArrayTrack™, users can easily select a statistical method applied to stored microarray data to determine a list of differentially expressed genes. The gene list can then be directly linked to pathways and gene ontology for functional analysis.
  9. Boxplots are useful for determining where the majority of the data lies