Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
SAGE- Serial Analysis of Gene Expression
1. Serial Analysis of Gene Expression (SAGE)
Technology
By: Dr. Ashish C Patel
Assistant Professor
Vet College, AAU, Anand
2. Serial Analysis of Gene Expression
It is believed that the majority of biological phenomena found in a
variety of organisms can be explained by the quantity of gene
products.
To understand the cellular functions under the certain conditions
at a certain time By measuring the mRNAs of different
genes and respective numbers of mRNAs at a point of time.
Each cell contains more than 10000 mRNAs of different genes,
copies of mRNAs of each gene ranging from one to more than
10000, and, as a total, up to half a million mRNA transcript copies.
It is therefore practically impossible to determine them.
3. Large-scale Random cDNA sequencing by EST project was very
useful for the identification of unknown genes expressed in given
cells or tissues. (Adams et al., 1991)
mRNA Species 1 …………….
mRNA Species n
Plasmid Insertion
cDNA clones
RE
Assemble EST1…n
Hence, sequencing = n x n times
cDNA
Assemble EST1…n
Assemble EST1…n of all seq. projects
All
steps
4. • However, this approach was not designed to quantify expressed
genes.
• The body mapping project (Okubo et al., 1992) attempted to
construct gene expression profiles of a number of cells and tissues
by random sequencing of a 3’-directed cDNA library.
• About 300 bp fragments of these 3’-region were called gene
signature and each represented a particular mRNA species.
• By sequencing 1000 or more cDNA clones, they could make a
rough pattern of gene expression and identify mRNAs of highly
abundant class.
• However, an expected weakness of both EST and body mapping
projects, in which one sequencing process yields only one cDNA
sequence.
• Mainly because of this low throughput, the profiles obtained by
the body mapping project unavoidably became a long way from
what is expected and demanded.
5. • Although the more recent methods of hybridization-based analyses
(DNA microarray) using immobilized cDNAs or oligonucleotides
can potentially examine the expression patterns of a relatively
large number of genes but these method can only examine
expressed sequences that have already been identified.
• In contrast, the SAGE method allows for a quantitative and
simultaneous analysis of a large number transcripts in any
particular cells or tissues, without prior knowledge of the genes.
• As the body mapping procedure, this method takes advantage of
the 3’-portion of mRNA as the gene tag, but of much shorter form
(9–10 bp).These tags can be serially connected before cloning into
a plasmid vector.
• Since the resulting plasmid clones contain multiple tags,
sequences of several dozens of mRNAs can be obtained by a
single sequencing reaction.
6. • Rapid and cost-saving sequencing by this original device allows
quantification and identification of a large number of cellular
transcripts.
7. • SAGE is based mainly on two principles, representation of
mRNAs (cDNAs) by short sequence tags and concatenation of
these tags for cloning to allow the efficient sequencing analysis.
• The hypothetical eukaryotic cell that contains seven mRNA
molecules composed of four species is depicted.
• To explain the gene expression profile of this cell, they would
have to conduct several cDNA sequencing reactions.
• However, if each mRNA species can be represented by a short
unique sequence stretch (such as 9 bp tag), the purpose would be
attained by sequencing them, because a sequence stretch as short
as 9 bp can distinguish 49 (262 144) transcripts, provided a
random nucleotide distribution throughout the genome.
• If we could connect these tags into a long stretch of DNA
molecule, sequencing reaction would be needed only once.
Principle of SAGE
8. The Principle of SAGE. The hypothetical eukaryotic cell that
contain seven mRNA molecules composed of four species is shown
as a model. Boxed are tags that are proper to mRNA species
9. SAGE Scheme
SAGE method allows for a quantitative and simultaneous analysis of
a large number of transcripts in any particular cells or tissues
mRNA species 1
mRNA species 2
mRNA species 3
9–10 bp tag
AAAAA
AAAAA
AAAAA
clone
Extract tags ,concatenate in plasmid
10. SAGE Scheme
Isolate insertion seq from plasmid
sequencing
TAGCGG.. ATGCGGC.. TATTTTAGC…
mRNA tag of species 1 mRNA tag of species 2 mRNA tag of species 3
Use BLAST service
Human genome
ATCGCC
TAGCGG
TACGCCG
ATGCGGC
ATAAAATCG
TATTTTAGC
Annotated Gene 1 Annotated Gene 12 Annotated Gene 34
Result: gene 1, 12, 34 are expressed during certain time say mitosis
12. SAGE procedure
AAAAA
TTTTT
TTTTT
AAAAA
5’ GTAC
Bind to streptavidin beads
TTTTT5’ GTAC
Divide in half
TTTTT5’ GTAC
AAAAA
AAAAA
TTTTT
AAAAA
5’ GTAC
The cDNA is then cleaved with a restriction enzyme (called anchoring
enzyme, NlaIII
The cDNA with a cohesive end at its 5’terminus is immobilize b
binding to streptavidin-coated beads.
13. SAGE procedure
GTAC
AAAAA
TTTTT
CATGGGGA
CCCT
GTAC
CATGGGGA
CCCT
AAAAA
TTTTT
Linkers A
Linkers B
Cleave Tagging Enzyme (TE) e.g.
BsmFI.
Linkers have RE site for BsmFI or FokI
TE RE site
TE RE site
GTAC
CATGGGGA
CCCT
NNNNN
NNNNNNNNNNNNN
Overlapping
end
CATGGGGA
CCCT
NNNNN
NNNNNNNNNNNNNGTAC
T4 DNA polymerase
GTAC
CATGGGGA
CCCT
NNNNNNNNNNNNN
NNNNNNNNNNNNN
CATGGGGA
CCCT
NNNNNNNNNNNNN
NNNNNNNNNNNNNGTAC
Blunt
end
Two independent linkers are ligated using NlaIII cohesive termini to each
14. SAGE procedure
GTAC
CATGGGGA
CCCT
NNNNNNNNNNNNN
NNNNNNNNNNNNN
CATGGGGA
CCCT
NNNNNNNNNNNNN
NNNNNNNNNNNNNGTAC
5’ 5’
Ligate tail-to-tail orientation
GTAC
CATGGGGA
CCCT
NNNNNNNNNNNNN
NNNNNNNNNNNNN
CATG CCCT
GGGA
NNNNNNNNNNNNN
NNNNNNNNNNNNN
Amplify by primers A and B
GTAC
CATGGGGA
CCCT
NNNNNNNNNNNNN
NNNNNNNNNNNNN
NNNNNNNNNNNNN
NNNNNNNNNNNNN
primer A
primer B
GTAC
CATG CCCT
GGGAGTAC
Two portions are mixed again and ligated. The 5’ends of the
linkers are blocked by amino group, only the mRNA-derived
termini are able to be ligated in a tail-to-tail orientation
15. SAGE procedure
After 1 round of amplification
GTAC
CATGGGGA
CCCT
NNNNNNNNNNNNN
NNNNNNNNNNNNN
NNNNNNNNNNNNN
NNNNNNNNNNNNN
GTAC
CATGGGGA
CCCT
NNNNNNNNNNNNN
NNNNNNNNNNNNN
NNNNNNNNNNNNN
NNNNNNNNNNNNN
AE RE site
AE RE site
NNNNNNNNNNNNN
NNNNNNNNNNNNN
NNNNNNNNNNNNN
NNNNNNNNNNNNNGTAC
CATG
CATGGGGA
CCCT
CATG CCCT
GGGA
CATG CCCT
GGGA
GTAC
GTAC
GTAC
CCCT
GGGAGTAC
NNNNNNNNNNNNN
NNNNNNNNNNNNN
NNNNNNNNNNNNN
NNNNNNNNNNNNNGTAC
CATG
Isolate ditags
Amplified product cleaved by NlaIII, an anchoring enzyme
Ditag fragments flanked both ends with NlaIII cohesive
terminus are isolated and ligated to obtain concatemers
18. • SAGE is a tool for the study of gene expression, a variety of
biological phenomena has been analyzed. Total tags analyzed by
this method are close to five million up to year 2000.
• Table 1 showing highly diverse types of cells and tissues under a
variety of physiological and pathological conditions can be
noticed. Numbers of total collected tags in each study were
variable.
19.
20. Cancer studies (Lal et al., 1999)
• By comparing the gene expression profiles derived from
cancer and normal tissue of interest, a large number of
genes were identified as tumor specific.
• Usually Northern blot hybridization analysis was
performed for the confirmation of differential expression
of these genes against a number of independently isolated
tissue samples of similar nature.
• About half of the overrepresented genes identified by
SAGE were reproducibly present in these samples, while
the behavior of the other half was quite different. This may
reflect the heterogeneity among tumors from different
individuals.
21. Immunological studies
• A few SAGE analysis has been directly applied for the study of
immunological phenomena.
• Chen et al. (1998) have reported that the changes in gene
expression in the rat mast cells before and after they were
stimulated through high affinity receptors for immunoglobulin E.
• It had not been previously associated with mast cells were
macrophage migration inhibitory factor, receptors for growth
hormone-releasing factor and melatonin.
• Many other genes that were differentially expressed were those
related to cell structure and cell motility, and numerous unknown
genes that showed no database-matching.
22. Yeast
• Yeast is widely used to clarify the biochemical and physiologic
parameters underlying eukaryotic cellular functions.
• The entire genome sequence has been determined (Goffeau,
1997) and the number of genes has been estimated to be about
6300.
• Total mRNA molecules were also been estimated to be15 000
per cell (Hereford and Rosbach, 1977).
• So, yeast was chosen as a model organism to evaluate the power
of the SAGE technology.
23. Drawbacks, problems and technical modifications
• As technical problems, a disadvantage of the need of relatively high
amount of mRNA, relative difficulty to construct tag libraries and others.
• MicroSAGE (Datson et al., 1999) requires 500–5000-fold less starting
input RNA, and is simplified by the incorporation of a ‘one-tube’
procedure for all steps from RNA isolation to tag release.
• SAGE-lite, is another similarly-devised protocol also allows the global
analysis of transcription from less than 100 ng of total starting RNA
(Peters et al., 1999).
Technical difficulty of the procedure;
• In the original SAGE protocol, major products of PCR are often linker-
dimers. To minimize contaminating linker molecules, biotinylated PCR
primers were introduce, which generates biotinylated ditag products, thus
allowing removal of the unwanted linkers by binding to streptavidin
beads used at a later stage.
24. • A simple introduction of heating step at final ligation step
yields cloned concatemers with an average of 67 tags as
compared to 22 tags obtained by the original protocol.
• A major problem of the SAGE approach is how to further
analyze the unknown tags.
• The utilization of a conventional oligonucleotide-based plaque
lift method was employed successfully for the isolation and
cloning of a number of genes.
• However, it is almost impossible to discriminate one-base
mismatched sequence within oligonucleotides of only 13–14 bp
in length rather than temperature-regulated DNA–DNA
hybridization technology, thus resulting in numerous false
positives.
• An RT-PCR-based method was developed to analyze the
corresponding genes and this approach utilizes identified tag
sequences and oligo-dT as PCR primers.
25. • Matsumura et al. (1999) reported a procedure to recover a
longer cDNA fragment by PCR using the SAGE tag sequence
as a primer, thereby facilitating the analysis of unknown genes
identified by tag sequence in SAGE.
• Sequencing Error: Sequencing error rate affect a SAGE
experiment which can improve by using phred scores and
discarding ambiguous sequences.
• Short SAGE comprised 14bp and long SAGE comprised 21bp.
• About 12% of C. elegans tags are not unambiguously identified
using 14bp tags (Mc Kay et al., 2003). Results of empirical
data suggests that Long SAGE gives far greater resolution, but
at an increased cost.
26. SAGE Data Analysis Strategies
• The sequence files generated by the automated sequencer are
analyzed using the SAGE2000 software (www.sagenet.org).
• The three steps involved in obtaining a differential gene
expression list are as follows:
(1) Interpret the SAGE tags from the sequence data files by using the
SAGE2000 software for extracting ditags and checking for
duplicate ditags;
(2) Download a reference sequence database from the NCBI Web
site (SAGEmap, www.ncbi.nlm.nih.gov); and
(3) Associating the tags to the expressed gene database.
The relative transcript abundance can then be calculated by dividing
the unique tag count by the total tags sequenced, and the fold
change can be determined by the ratio of tags between
libraries.
27. • The initial analysis is usually limited to a predefined tag ratio of
greater than 5-fold and a value of P≤0.05.
• The rates of false-positives associated with different probability
values have been computed by Monte-Carlo test to validate
confidence intervals.
• Depending on the preliminary results, the SAGE data can be
reanalyzed by varying the P values and the fold-change
thresholds.
32. SAGE APPLICATION
• SAGE is useful in comparative expression studies to identify
differences in gene expression between two or more cellular
sources of RNA.
• Gene Discovery
• Determining changes on gene expression as consequence of an
experimental treatment (e.g. carcinogen, hormone)
• Provides quantitative data on both known and unknown genes
• Analyzes all transcripts (Transcriptome) without prior selection of
known genes
• Analysis of Cardiovascular gene expression
• Gene expression in carcinogenesis
• Substance abuse studies
• Cell, tissue and developmental stage profiling
• Profiling of human diseases
33. SAGE – Advantages & Disadvantages
Advantages
• No hybridizing, so no cross-hybridizing can occur.
• Can help identify new genes by using tag as a PCR primer
Disadvantages
• Cost and time required to perform so many PCR and
sequencing reactions.
• Type IIS restriction enzyme can yield fragments of the wrong
length depending on temperature.
• Multiple genes could have the same tag
• As with microarrays, mRNA levels may not represent protein
levels in a cell