3. Genome analysis tells us what genes are
present, but before we can determine the
organism’s phenotype, we need to know how
those genes are expressed: under what
conditions, in what tissues, how much gene
product is made, etc.
Also, understanding and curing diseases is tied to
the analysis of what genes are expressed in disease
states.
4. Serial Analysis of Gene
Expression
The basis of this technique is
that a gene can be uniquely
identified using only a small
(10-30 nt) piece from the 3’
end (which is not translated)
These tags are extracted
(from cDNA), then
concatenated into long
molecules that are amplified
with PCR (or cloned) and
sequenced.
The number of times each
tag appears is proportional
to the amount of its mRNA
present.
Much SAGE data in NCBI.
5. DDD is based on data
from EST
experiments. The
NCBI UniGene
database combines
ESTs for each gene
separately. The
proportion of ESTs
from a given gene can
be compared between
experimental
treatments.
This is obviously
limited to well-studied
species.
6. This is a new method, published in 2008. It is
probably the method of choice today for
analyzing RNA content. Also called whole
transcriptome shotgun sequencing.
Very simple: isolate messenger RNA, break it
into 200-300 base fragments, reverse transcribe,
then perform large scale sequencing using 454,
Illumina. Or other massively parallel
sequencing technology.
RNA sequences then compared to genomic
sequences to find which gene is expressed and
also exon boundaries
Exon boundaries are a problem with very short
reads: you might only have a few bases of overlap
to one of the exons.
As with all RNA methods, which RNAs are
present depends on the tissue analyzed and
external conditions like environmental stress or
disease state.
Get info on copy number over a much wider
range than microarrays. Also detects SNPs.
7. SDS-PAGE is a method for
separating proteins according to
their molecular weight.
SDS = sodium dodecyl sulfate
(a.k.a. sodium lauryl sulfate), a
detergent that unfolds proteins
and coats them in charged
molecules so that their charge to
mass ratio is essentially identical.
“Native” gel electrophoresis
uses undenatured proteins,
which vary greatly in charge
to mass ratio.
SDS denaturation isn’t perfect:
some proteins behave
anomalously,
PAGE = polyacrylamide gel
electrophoresis
8. DNA microarrays and DNA chips are essentially
the same thing: a set of DNA molecules attached to
a solid substrate in an array of very small spots.
Affymetrix is a company that sells microarray chips
attached to a silicon substrate
Many microarrays are homemade: DNA spotted onto
glass microscope slides
Microarrays work by hybridization: cDNA made
from mRNA is labelled with a fluorescent tag, then
hybridized with the array. After washing, only
complementary sequences remain bound. A laser
scanner excites each spot in turn, and the amount of
fluorescence is read. The level of fluorescence is
proportional to the amount of mRNA present in the
original prep.
Originally, cDNA from each gene was used to make
the array, Later, synthetic oligonucleotides were
used, and today, 50-60 not synthetic
oligonucleotides based on the gene sequences seem
to be the standard.
In most cases, RNAs from two different conditions
are compared (experimental vs. control). The two
cDNAs derived from the RNAs are labelled with
Cy3, a green-fluorescing dye, and Cy5, a red-fluorescing
dye.
If the two RNAs are present in equal amounts, you get
a yellow spot; otherwise red or green predominates.
9. Microarray data is subject to a lot of potential errors. These fall
into 3 main categories: replication, background subtraction, and
data normalization.
Replication of each experimental data point is essential. There is a
lot of variation between spot intensities in a typical experiment,
especially with home-created microarrays.
The background fluorescence level needs to be subtracted from all
data points. Since the background is not necessarily uniform, this
can lead to spots with negative intensities (which can be set to
zero).
Data normalization means attempting to bring the variance of the
expression level to a constant value. It has been observed that the
variance tends to increase with stronger signals. A way to correct
for that is to include a multiplicative error term as well as an
additive error term in statistical calculations.
10. Most microarray experiments compare
2 conditions, using red and green dyes.
Thus each gene sequence gives data
that is a ratio of red to green.
The problem is, when plotted on a
regular linear graph, the distance
between ½ and 1 is much smaller than
the distance between 1 and 2, even
though they express the same (but
inverse) ratios.
The solution is to take the base 2
logarithm of the red/green ratio.
log2(x) = -log2(1/x), so increases and
decreases give similar ranges.
Similarly, the expression level can be
expressed as the geometric mean of the
red and green signals: The square root
of red times green. However, taking
the logarithm of this spreads the data
out better.
Other data manipulations can further
improve appearances.
RG = geometric _ mean
11. Dr. Leming Shi, National Center for
Toxicological Research.
"Microarray Quality Control (MAQC) Project“
Wilson CL, Miller CJ (2005). "Simpleaffy: a
BioConductor package for Affymetrix Quality
Control and data analysis"