Course: Bioinformatics for Biomedical Research (2014).
Session: 3.2- Basic Aspects of Microarray Technology and Data Analysis.
Statistics and Bioinformatisc Unit (UEB) & High Technology Unit (UAT) from Vall d'Hebron Research Institute (www.vhir.org), Barcelona.
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformatics Course - Session 3.2 - VHIR, Barcelona)
1. Bioinformàtica per a la
Recerca Biomèdica
Ricardo Gonzalo Sanz
ricardo.gonzalo@vhir.org
20/05/14
Hospital Universitari Vall d’Hebron
Institut de Recerca - VHIR
Institut d’Investigació Sanitària de l’Instituto de Salud Carlos III (ISCIII)
Basic aspects of Microarray
technology
3. 1 Introduction
reproducibility
only show you what you’re looking for
what about ‘indels’, inversions, translocations...
accuracy
sensitivity
5. 1 Introduction
RNA-Seq was superior in detecting low abundance transcripts
also better detecting differentiating biologically isoforms
RNA-Seq demonstrated a broader dynamic range than microarray.
6. 1 Introduction
• In molecular biology exist a lot of techniques to measure the gene expression
(Northern blot)
• Main characteristic from the microarrays discovery (Schena et al. (1995)
Science 270:467-70), was not what could be measured, instead the quantity of
simultaneous measures that could be done.
• Pre microarrays time: study of genes was one by one
• Post microarrays time: all the genes together.
7. 1 Introduction
• But.... what is a microarray in few words?
DNA fixed to a solid surface (nylon, silica, glass,...)
RNA “problem” is labeled and have to bind to DNA
fixed in the solid surface in an specific way.
DNA binded usually is called “probe”
Labeled RNA usually is called “target”
8. Important to know in advanced...
1 Introduction
• Microarrays are usually hypothesis-generating:
They highlight specific genes or features that are particularly
interesting for follow-up experiments.
An exception would be the biomarkers discovery studies.
• This does not reduce the importance of experimental design
9. 2
Two color microarrays (cDNA)
• Usually probes are long (20nt)
• Probe is fixed to a glass
• Labeling is with two fluorocrom (Cy3/Cy5).
• Direct comparison of the two samples due
to they are hybridized in the same array.
• Each gene appear few times in the array
• Long probes facilitate crosshybridization
• Not very good reproducibility.
Different types of arrays. Manufactoring. DNA/RNA
10. 2
One color microarrays
• Short probes (20-25 nt)
• Target is labeled with only one fluorocrom
• Only one sample is hybridized in each array.
• Each gene is represented by a lot of probes
in the array
Different types of arrays. Manufactoring. DNA/RNA
11. 2 Different types of arrays. Manufactoring. DNA/RNA
• DNA Polymorphism (GWAS)
• Transcription Factors
• Resequencing
• Cytogenetics
• Expression
• Alternative splicing
• microRNA
DNA RNA
12. 2 Different types of Affymetrix arrays.
3’5’
3’ IVT Arrays
• Biased measurement of the gene expression
• Array more used in the literature. A lot of species present.
Only genes with polyA tail and good 3’ site will
be amplified and will have the chance of
hybridize correctly.
13. 2 Different types of Affymetrix arrays.
3’5’
Gene Arrays
Exon Arrays
Gene/Exon Arrays
• Gene arrays are the most used (good quality and price ratio)
• Gene arrays 2.0 more updated library and also includes lncRNAs
14. 2 Different types of expression arrays.
•153 organisms in the array (human, mouse, rat, canine, ….)
•100% miRBase v17
•2.216 snoRNAs and scaRNAs (human small nuclear RNAs)
•Low inputs amounts (130 ng total RNA)
•2.999 probe sets unique to pre-miRNA hairpins
•Able to differentiate pre and mature miRNAs
•Useful for FFPE samples
miRNA
25. Bioinformàtica per a la
Recerca Biomèdica
Ricardo Gonzalo Sanz
ricardo.gonzalo@vhir.org
20/05/14
Hospital Universitari Vall d’Hebron
Institut de Recerca - VHIR
Institut d’Investigació Sanitària de l’Instituto de Salud Carlos III (ISCIII)
Basic aspects of Microarray
Data Analysis
34. 2 Quality Control
Was the experiment a success???
• Microarray experiments generate huge quantitites of data
• Standard statistical approach use plots to check the quality
show all data together
highlight structures
may help to detect problems (“unusual patterns”)
It is hard to decide if things “seem to be
all right” just by looking at the numbers.
35. 2 Quality Control
Diagnostics plots for microarrays:
• Microarray data usually considered at two levels
1. Low level. Data directly coming from the scanner
2. High level. Processed from low level data. Expression values,
normalized or not.
• Some plots are specific for some type of arrays or for some level
36. 2 Quality Control
Diagnostics plots for microarrays:
1. Low level:
Layout image
Degradation plots (only in 3’IVT)
Histogram/density plots
PCA, Boxplot
2. High level:
MA plots
Model based plots (NUSE,RLE,)
PCA, Boxplot
48. 2 Quality Control
Diagnostics plots for microarrays. High level. MA plots
• MA plots allow pair wise comparison of log-intensity of each array to a
reference array and identification of intensity-dependent biases.
• The Y axis of the plot contains the log-ratio intentsity of one array to the
reference median array, which is called “M” while the X axis contains the
average log-intensity of both arrays – called “A”.
• The probe levels are not likely to differ a lot so we expect a MA plot centered
on the Y=0 axis from low to high intensities.
55. 4 Filtering
• In a microarray experiment only a few hundreds/thousand of genes change their
expression due to the different conditions
•Researcher is interested in keeping the number of tests/genes as low as possible
while keeping the interesting genes in the selected subset.
•If the truly diferentially expressed genes are over-represented among those
selectec in the filtering step, the FDR associated with a certain threshold of the
statistic test will be lowered due to the filtering.
Genes that do not change introduce
noise, therefore is better not to be
present when the statistical analysis is
done
56. 4 Filtering
Exists different types of filtering:
• Annotation features (specific):
Specific gene features (i.e. GO term, presence of transcriptional regulative
elements in promoters, etc.)
Data derived from IPA
• Signal features (non specific)
% intensities greater of a user defined value
Interquantile range (IQR) greater of a defined value
57. 4 Filtering
Signal filtering: This technique has as its premise the removal of genes that are
deemed to be not expressed or unchanged according to some specific criterion that
is under the control of the user.
58.
59. 5 Statistical inference of diferential expression
• Indirect comparisons: 2 groups, unpaired
• Direct comparsions: 2 groups. paired
65. 6 Clustering
Types:
Supervised clustering try to find the best partition for data that belong to a
know set o classes
Unsupervised clustering try to define the number and the size of the classes
in which the transcription profiles can be fitted in.
67. 6 Clustering
Hierarchical Clustering (HCL)
• HCL is an agglomerative /divise clustering method.
• The iterative process continues until all groups are
connected in a hierarchical tree.
• Samples more similar between them are closed.