2. Goals
Understand what Genevestigator is and why it has been developed
Understand the function of the tools provided by the software
Learn how to use Genevestigator to find genes of interest
2
4. Microarray technology
Advantages:
– Genome wide
– Relatively cheap
– Standardized streamlined handling
– Use of an optimized system based on oligonucleotide sequences
– Possibility to store data in publicly available repositories
Disadvantages:
– Sequence must be known in advance
– Hybridization reaction
5. Workflow of a microarray experiment
Conditions selection and experiments
RNA extraction, amplification and
Hybridization labelling
Hybridization on chips
Each pixel intensity is determined by the
DAT file expression level of a gene in the specific
Scanned raw image sample hybridized on the array
Raw Data (Probe level)
CEL file Quality Control
Normalization
Normalized Data
TXT file Analysis Submission to repository
Validation (Q-PCR)
5
6. Concept of Genevestigator
Tissue type 1
Tissue type 2
Tissue type 3
Tissue type 4
…
…
…
…
…
Tissue type 200
Thousands of microarray Model of a
experiments exist world-wide summarized output
=> Summarize information from thousands of public experiments into
easily interpretable results
6
7. Concept of Genevestigator
Build a systematic database of gene expression information
Data repositories Curation Genevestigator
anatomy
development
condition
genotype
Data Expert annotation
quality with systematic
meta-analysis? control ontologies meta-analysis!
7
8. 1. Data Curation - Overview
Quality control all sample data 1. Data Curation
Collect raw data files and normalize
data
anatomy
development
Read and understand the experiment condition
genotype
Quality control Expert annotation
Manually annotate experiments using + with systematic
structured vocabularies (ontologies) Normalization ontologies
Final goal of curation: translate
experimental information in computer-
readable and „statistically usable“ form
8
10. Curation: normalization models
Multi-array models
– e.g. dChip, RMA, gcRMA
– all arrays from an experiment are normalized simultaneously
– cannot easily be used to create large databases
– RMA and gcRMA use perfect-match information only (background estimation by
statistical approaches)
Single array models
– e.g. MAS5
– normalize each array independantly
– does not correct for biases between experiments
– MAS5 uses both perfect-match and mismatch probe information
(mismatch is used to model background (biochemical approach))
10
11. Curation: Ontologies
Ontologies built for
– Anatomical parts Anatomy
– Stages of development Ontology:
- Arabidopsis
– Perturbations (diseases, chemicals, etc.) - Rice
- Barley
Ontologies (version 2008)
– Were compiled from various public ontology
sources and own developments
– Are built using tree structures
Development
Ontology:
- Mouse
11
13. Curation: Data content
Total 1’742 54’786
As of December 2010: > 54’000 Affymetrix arrays
World’s largest standardized, quality
controlled, and manually annotated gene
expression compendium for plants, animals,
and microorganisms!
13
14. Genevestigator application
Database and analysis engine
Website with user support
Analysis tool for the user
Requirements
Browser
– Genevestigator works in Internet Explorer,
Firefox, Safari, Opera, and Chrome
Java
– Sun Microsystems; Minimal: Java 1.4.2. or
higher
Computer:
– 500 MB RAM or more
14
20. Meta-Profile Analysis: Anatomy tool
Looks at how genes are
expressed in different tissues
Mean and standard deviation
Anatomy categories as a tree
(ontology); expand / collapse
Number of arrays per
category is indicated
20
21. Meta-Profile Analysis: Neoplasm tool
Looks at how genes are
expressed in different tumors
Clinical parameters of the
tumors are available
Mean and standard deviation
Anatomy categories as a tree
(ontology); expand / collapse
Expression profile of NPY across different tumor types
Number of arrays per
category is indicated
21
22. Meta-Profile Analysis: Development tool
Looks at how genes are
expressed during the life cycle
of an organism
Example for barley
Example for mouse / rat
22
23. Meta-Profile Analysis: Conditions and Genotype tools
Most upregulating conditions
List (or tree)
of various Spots indicate the
conditions responses of selected
gene(s) to the list of conditions
Most downregulating conditions
23
24. Meta-Profile Analysis: Scanner tool
All arrays are represented on
a single screen
Easily find and select
experiments in which
expression is particularly high
(screen for peaks)
Magnifying glass and tooltip
allow to look into details of
signals, arrays, and
experiments.
24
25. Meta-Profile Analysis: Samples tool
All arrays are represented in a
single plot, scroll down
Look at expression level and
“absent / present” calls
Tooltips allow to look into
details of arrays and
experiments.
25
27. Biomarker search
1. Choose an organism
2. Choose conditions and
run analysis
3. Save target genes for
further analysis
27
28. Biomarker Search
Identify genes that exhibit specific expression
characteristics
Anatomy
Development
Conditions / Genotype
28
29. Classical biomarker search
condition 14
condition 15
condition 10
condition 11
condition 12
condition 13
condition 16
condition 17
condition 5
condition 1
condition 2
condition 3
condition 4
condition 6
condition 7
condition 8
condition 9
gene 1 Most biomarker search
gene 2 approaches look for the genes,
gene 3 which respond the most to a
gene 4 given condition
gene 5
gene 6
gene 7 This condition may include
gene 8 multiple similar studies
? ?
gene 9
gene 10
gene 11 How these genes respond to
gene 12 other conditions is unknown,
gene 13 because they were not included
gene 14 into the analysis
gene 15
gene 16
gene 17
29
30. Biomarker validation in Genevestigator
condition 14
condition 15
condition 10
condition 11
condition 12
condition 13
condition 16
condition 17
condition 5
condition 1
condition 2
condition 3
condition 4
condition 6
condition 7
condition 8
condition 9
gene 1 Genevestigator allows to find out
gene 2 how specific these genes are
gene 3 (Meta-Profile Analysis ->
gene 4 Stimulus/Mutation tools)
gene 5
gene 6
gene 7 Only few are responsive only to
gene 8 condition 9 (black arrows). All
gene 9 others are sensitive to one (grey
gene 10
arrows) or more other
gene 11
conditions.
gene 12
gene 13
gene 14
gene 15
gene 16
gene 17
30
31. Biomarker Search in Genevestigator
condition 14
condition 15
condition 10
condition 11
condition 12
condition 13
condition 16
condition 17
condition 5
condition 1
condition 2
condition 3
condition 4
condition 6
condition 7
condition 8
condition 9
The Genevestigator Biomarker Search
gene 3
tools identify genes that are
gene 5
specifically responsive to the
gene 7
chosen condition (they respond
gene 13
minimally to other conditions).
gene 17
gene 10
gene 2
gene 15 These genes are not necessarily the
gene 9 ones with the strongest response to
gene 12 the chosen condition
gene 4
gene 11
gene 16
gene 1
The Genevestigator Biomarker Search
gene 6
tools usually find other target
gene 8
candidates than classical tools, which
gene 14
analyze only a subset of experiments
31
33. Biomarker Search: example
Search for genes that are associated with a set of conditions, e.g. how do
abiotic stresses relate to hormonal responses?
hormonal
responses
abiotic
stresses
BL / H3BO3(+)
ABA (+) --- ABA (+) MeJA (+) ethylene (+)
anoxia (-)
salt (+) salt (-) salt (+) salt (+) hypoxia (-) hypoxia (-)
osmotic (+) osmotic (-) osmotic (+) drought (+)
cold (+)
33
34. Biomarker Search in Genevestigator
Example: human genes responsive to Actinomycin-D
target condition(s) Actinomycin-D
vMyb Oncolytic herpes Propiconazole Sapphyrin Echinomycin
simplex virus
Cell cycle inhibition
co-inducing conditions Chemical: ARC
34
35. RefGenes
Goal: identify reference genes for use in qPCR.
Solution: search the Genevestigator database for genes that show constant
expression in a certain category of arrays.
35
36. RefGenes: validation experiment with mouse liver
Validation experiment
on mouse liver
geNorm selection of the most
stable reference genes within
this experiment
Dataset: 197 arrays from mouse liver
36
37. Clustering Analysis
Goal: to identify groups of genes
that have similar expression
characteristics
Tools:
– Hierarchical clustering (with leaf
ordering)
– Biclustering (BiMax algorithm)
37
38. Biclustering
Search for biclusters in a list of 64 genes responsive to myocardial
infarction
One of many possible biclusters Development profile of these 7 genes
38
39. Advantages of using Genevestigator
Benefit from the normalized data from 54’000 arrays on 12 organisms
Extended and precise gene search according to:
- Anatomy
- Development
- Stimulus / Mutation
Find genes, which might be interesting for a further study
Gain further information about specific gene sets
Find appropriate reference genes for the conditions you study
Rapidly compare, validate and extend data
39
43. Problems with classical reference genes
Most groups use common housekeeping genes such as β-Actin or GAPDH
to normalize qPCR data
Depending on the condition studied, these genes show some regulations
and are therefore unsuitable
Hypothesis: for each biological context, there is a subset of genes that are
most suitable to normalize expression data from this context.
43
46. Affymetrix GeneChip® scanned image
DAT file
Scanned raw image
CEL file TXT file
Each pixel intensity is determined by the Raw Data (Probe level) Normalized Data
expression level of a gene in the specific Quality Control Into repository
sample hybridized on the array Normalization
46