SlideShare une entreprise Scribd logo
1  sur  30
Télécharger pour lire hors ligne
RNA bioinformatics
Paul Gardner
April 2, 2015
Paul Gardner RNA bioinformatics
Main questions
How can we predict RNA structure?
Paul Gardner RNA bioinformatics
Why do we care about RNA?
RNA is important for translation and gene regulation
2
3 of the ribosome is RNA. Ribosomal function is preserved
even after amino-acid residues are deleted from the active site!
Current estimates indicate that the number of ncRNA genes is
comparable to the number of protein coding genes.
mDNA
uDNA
rDNA
tDNA
pre-mRNA
mRNA
nascent
protein
localised
protein
spliceosome
ribosome
tRNA
+
RNase P
RNase MRP+snoRNP
snoRNP
SRP
tmRNA
transcription
splicing
translation
transport
RISC (miRNA)
Paul Gardner RNA bioinformatics
RNA: why is this stuff interesting?
RNA world was an essential step to modern protein-DNA
based life (using current reasonable models).
Which came first, DNA or protein?
RNA has catalytic potential (like protein), carries hereditary
information (like DNA).
Image by James W. Brown, www.mbio.ncsu.edu/JWB/soup.html
Paul Gardner RNA bioinformatics
RNA interference
Image lifted from: http://en.wikipedia.org/wiki/RNA interference
Paul Gardner RNA bioinformatics
RNA: structure
G
C
G
G
A
U
UU
A
GCUC
AGD
D
G
G G A
G A G C
G
C
C
A
GA
C
U
G
A A
.
A
.
C
U
G
GAGG
U
C
C U G U G
T . C
G
A
UC
CACAG
A
A
U
U
C
G
C
A
C
CA
Variable
LoopAnticodon
Loop
T ΨC
Loop
10 15 20 25 30 355 40 45 50 55 60 65 70 75
Anticodon
Loop
Acceptor
Stem
GCGGAUUUAGCUCAGDDGGGAGAGCGCCAGACUGAAYA.CUGGAGGUCCUGUGT.CGAUCCACAGAAUUCGCACCA5’ 3’
Secondary Structure Tertiary StructureB C
Primary StructureA
Acceptor
Stem
T ΨC
Loop
ΨΨ
Ψ
Ψ
Y
65
60
55
40
10
20
15
5
70
75
25
30
35
45
50
D Loop
3’
5’
5’
3’
D Loop
Paul Gardner RNA bioinformatics
RNA: base-pairing
Canonical (Watson-Crick) base-pairs C · G, A · U.
Non-canonical (Wobble) base-pair G · U
Note: other non-canonical base-pairs do occur, but these are
“rare” and generally re-defined as “tertiary” interactions.
Central dogma of structural biology: structure is important for
function.
Images lifted from: http://en.wikipedia.org/wiki/Base pair
Paul Gardner RNA bioinformatics
RNA: base-pairing
Images lifted from: http://eternawiki.org/wiki/index.php5/Base Pair
Paul Gardner RNA bioinformatics
RNA: base-pairing
bpC C:G U:A U:G G:A C:A U:C A:A C:C G:G U:U Total
WC 49.8% 14.4% 0.01% 1.2% 0.1% 0.5% - - - - 66.1%
Wb 0.06% 0.06% 7.1% - 0.2% - 0.3% 0.5% 0.2% 0.9% 9.6%
Other 0.8% 5.8% 1.5% 9.4% 2.3% 0.6% 2.6% 0.5% 0.7% 0.3% 24.3%
Total 50.7% 20.3% 8.7% 10.6% 2.6% 1.0% 2.9% 1.0% 0.9% 1.3% 100.0%
Just 71.3% of rRNA contacts are canonical or G:U wobble!
Lee & Gutell (2004) Diversity of base-pair conformations and their occurrence in rRNA structure and RNA
structural motifs J Mol Biol.
Paul Gardner RNA bioinformatics
RNA stacking
Laurberg et al. (2008) Structural basis for translation termination on the 70S ribosome Nature. Image lifted from:
http://rna.ucsc.edu/pdbrestraints/index.html
Paul Gardner RNA bioinformatics
RNA: number of structures
AN is the number of possible secondary sequences of length N.
AN ∼ 4N
SN is the number of possible secondary structures of length N.
S0 = S1 = 1
SN+1 = SN +
N
j=1
Sj−1SN−j+1
SN ∼ 1.8N
Hofacker et al. (1998) Combinatorics of RNA Secondary Structures, Discrete Applied Mathematics.
Paul Gardner RNA bioinformatics
How can we make a secondary structure prediction
algorithm?
Maximize the number of base-pairs in a
RNA sequence?
Nussinov et al. (1978) Algorithms for loop matching, SIAM J. Appl. Math.
Paul Gardner RNA bioinformatics
Structure prediction: Nussinov
Nussinov et al. (1978) Algorithms for loop matching, SIAM J. Appl. Math.
Image from: Eddy SR (2004) How do RNA folding algorithms work? Nature Biotechnology.
Paul Gardner RNA bioinformatics
Structure prediction: Nussinov
Maximize the number of base-pairs in RNA sequence.
Seq = s1s2 · · · sn
Ni,j = 0, ∀ j − i < 3.
Ni,j = max



Ni+1,j−1 + ρ(i, j), i, j pair
Ni+1,j , i unpaired
Ni,j−1, j unpaired
maxi<k<j [Ni,k + Nk+1,j ] bifurcation
O(n3) in CPU, O(n2) in memory.
ρ(i, j) = 1 if si and sj are complementary, otherwise
ρ(i, j) = 0.
N1,n = BPmax .
Nussinov et al. (1978) Algorithms for loop matching, SIAM J. Appl. Math.
Paul Gardner RNA bioinformatics
Structure prediction: Nussinov
There are a few problems with this approach:
the solution to Nussinov is frequently not unique. For example,
the 77 nucleotide long tRNAhis
has 22 base-pairs in the
phylogentic structure, there are 149, 126 structures with the
maximal number of 26 base-pairs!
The method ignores stacking interactions.
Fontana (2002) Modelling ‘evo-devo’ with RNA. BioEssays.
Paul Gardner RNA bioinformatics
Structure prediction: Zuker
Nearest neighbour model
Modified Nussinov algorithm to find minimal free energy
(most stable) structures
A U
C G
U A
G C
S3
S2
S1
S1 S2 S3
GU L
A C
Free Energy = L + + +
= −1.70 kcal/mol
= 5.00 − 2.11 − 2.35 − 2.24
∆Gstack = ∆H37,stack − T∆S37,stack
∆Gloop = −T∆S37,loop
Tinoco et al. (1971) Estimation of secondary structure in RNA. Nature.
Paul Gardner RNA bioinformatics
Structure prediction: Zuker
WXY Z CG GC AU UA GU UG
CG -3.26 -2.36 -2.11 -2.08 -1.41 -2.11
GC -3.42 -3.26 -2.35 -2.24 -1.53 -2.51
AU -2.24 -2.08 -0.93 -1.10 -0.55 -1.36
UA -2.35 -2.11 -1.33 -0.93 -1.00 -1.27
GU -2.51 -2.11 -1.27 -1.36 +0.47 +1.29
UG -1.53 -1.41 -1.00 -0.55 +0.30 +0.47
Energies (∆G in kcals/mol) of 5
3
W
X
Y
Z
3
5 stacked basepairs.
Note that ∆G of 5
3
W
X
Y
Z
3
5 stacks is the same as 5
3
Z
Y
X
W
3
5 stacks.
Mathews et al. (1999) Expanded sequence dependence of thermodynamic parameters improves prediction of RNA
secondary structure. JMB.
Paul Gardner RNA bioinformatics
Suboptimal structures
“There is an embarrassing abundance of structures having a free
energy near that of the optimum.” (McCaskill 1990)
−5 0 5 10 15 20 25 30 35
−22
−21.8
−21.6
−21.4
−21.2
−21
−20.8
−20.6
−20.4
−20.2
−20
dBP
(Si
,Smfe
)
∆G(kcal/mol)
G
C
G
G
A
U
U
U
A
G
CU
C
A
G U
U
G
G
G
A
G
A
G
C
G
C
C
A
G
A
C
U
G
A
A
G
A U U
U
G
G
AG
G
U
C
C
U
G
U
G
U
U
C
G
A
U
C
C
A
C
A
G
A
A
U
U
C
G
C
A
G
C
G
G
A
UUU
A
GCUC
AGU
U
G
G G A
G A G C
G
C
C
A
G
A
C
U
G A
A
GA
U
U
U
G
GAGG
U
C
C U G U G
U U
C
G
AUC
CACAG
A
A
U
U
C
G
C
A
G
C
G
G
A
U
U
UA
G
C
UCAGUUG
GGAG
A
G C G
C C A
G A C U G A
AGAU
U
U G
G A
G G U C
C
U G
U
G
U
UC
GAUC
CA
CA
G
A
A
U
U
C
G
C
A
Biological
Suboptimal
MFE
Wuchty et al. (1999) Complete suboptimal folding of RNA and the stability of secondary structures, Biopolymers.
Paul Gardner RNA bioinformatics
Accuracy of MFE predictions
Non-independant benchmarks:
Walter et al. (1994) Mean sensitivity 63.6
Mathews et al. (1999) Mean sensitivity 72.9%
Independant benchmarks:
Doshi et al. (2004) Mean sensitivity 41%
Dowell & Eddy (2004) Mean sensitivity 56% Mean PPV 48%
Gardner & Giegerich (2004) Mean sensitivity 56% Mean PPV
46%
Data-sets: tRNA, SSU rRNA, LSU rRNA, SRP, RNase P, tmRNA.
Paul Gardner RNA bioinformatics
Limitations of MFE predictions
Energy parameters: estimated at constant salt
concentrations and temperatures.
Energy model: models of loop energies are extrapolated from
relatively few experiments, no pseudoknots, ...
Cellular environment: contains proteins, RNAs, DNAs,
sugars, etc
Post-transcriptional modifications: many functional RNAs
have been covalently modified.
Folding kinetics: RNAs fold along “pathways”, perhaps
becoming trapped in sub-optimal conformations.
Co-transcriptional folding: RNAs fold during transcription,
the transcriptional apparatus occludes 3’ portions of the
sequence.
Transcription is jerky: transcriptional pausing can influence
folding.
Paul Gardner RNA bioinformatics
Comparative sequence analysis
Input: a set of sequences with the same biological function
which are assumed to have approximately the same structure.
Output: the common structural elements, aligned sequences
and a phylogeny which best explains the observed data.
2
4
5
3
1
>1
GCAUCCAUGGCUGAAUGGUUAAAGCGCCCAACUCAUAAUUGGCGAACUCGCGGGUUCAAUUCCUGCUGGAUGCA
>2
GCAUUGGUGGUUCAGUGGUAGAAUUCUCGCCUGCCACGCGGGAGGCCCGGGUUCGAUUCCCGGCCAAUGCA
>3
UGGGCUAUGGUGUAAUUGGCAGCACGACUGAUUCUGGUUCAGUUAGUCUAGGUUCGAGUCCUGGUAGCCCAG
>4
GAAGAUCGUCGUCUCCGGUGAGGCGGCUGGACUUCAAAUCCAGUUGGGGCCGCCAGCGGUCCCGGGCAGGUUCGACUCCUGUGAUCUUCCG
>5
CUAAAUAUAUUUCAAUGGUUAGCAAAAUACGCUUGUGGUGCGUUAAAUCUAAGUUCGAUUCUUAGUAUUUACC
** *
1 GCAUCCAUGGCUGAAU-GGUU-AAAGCGCCCAACUCAUAAUUGGCGAA--
2 GCAUUGGUGGUUCAGU-GGU--AGAAUUCUCGCCUGCCACGCGG-GAG--
3 UGGGCUAUGGUGUAAUUGGC--AGCACGACUGAUUCUGGUUCAG-UUA--
4 GAAGAUCGUCGUCUCC-GGUG-AGGCGGCUGGACUUCAAAUCCA-GU-UG
5 CUAAAUAUAUUUCAAU-GGUUAGCAAAAUACGCUUGUGGUGCGU-UAA--
**** * **
1 ------------------CUCGCGGGUUCAAUUCCUGCUGGAUGC-A
2 ------------------G-CCCGGGUUCGAUUCCCGGCCAAUGC-A
3 ------------------G-UCUAGGUUCGAGUCCUGGUAGCCCA-G
4 GGGCCGCCAGCGGUCCCG--GGCAGGUUCGACUCCUGUGAUCUUCCG
5 ------------------A-UCUAAGUUCGAUUCUUAGUAUUUAC-C
S
M
A
D
M
Y
MUR
SYUC
A
MY-
G
G
Y
u a A
V M M M
R M
H
C
R
MY
U
S
H V R
H
K
C
V
R
c
K
W
A
-
-
-
-
- c c - c
c
a
-
c
-
-
-
c
c
c
-V-YS Y R R G
U U
C
R
AY
U
CCYRS
Y
M
D
M
Y
V
M
c
V
Paul Gardner RNA bioinformatics
Comparative sequence analysis
Evolution of RNA sequences
Base-pairs that covary have strong evolutionary support
U
A
C
A
A
G
A
G
U
G C
G
U
U
U
A
A
G
U
AY
R
Y
A
A
S
M
G
U
S C
G
Y
K
K
A
A
G
Y
RY
A
U
A
A
N
A
D
U
G C
G
U
U
G
A
A
G
U
R
c
b
(((..(((....)))..)))
(((..(((....)))..)))
(((..(((....)))..)))
(((..(((....)))..)))
UACAAGAGUGCGCUUAAGUA
UGCAAAAGUCCGUUUAAGCA
UAUAACCUUUCGAGGAAAUA
CAUAAUAAUGCGUUGAAGUG
a
MIS
YAUAANADUGCGUUGAAGURAncestral
UACAAGAGUGCGUUUAAGUA
YRYAASMGUSCGYKKAAGYR
consensus
consensusAncestral MIS
G U
A U
G C
U G
C G
U A
fast fast
slow
Paul Gardner RNA bioinformatics
Alignment Folding: RNAalifold
Generate an alignment (e.g. with ClustalW)
Find a consensus structure that is both energetically stable in
all sequences and has covariation support
G C G G A A U U A G C U C A G U U _ G G G A G A G C G C C A G A C U G A A A A U C U G G A G G U C C C C _ G G U U C G A A U C C C G G A A U C C G C A
G C G G A A U U A G C U C A G U U _ G G G A G A G C G C C A G A C U G A A A A U C U G G A G G U C C C C _ G G U U C G A A U C C C G G A A U C C G C A
GCGGAAUUAGCUCAGUU_GGGAGAGCGCCAGACUGAAAAUCUGGAGGUCCCC_GGUUCGAAUCCCGGAAUCCGCA
GCGGAAUUAGCUCAGUU_GGGAGAGCGCCAGACUGAAAAUCUGGAGGUCCCC_GGUUCGAAUCCCGGAAUCCGCA
G
C
B
K
M
W
WU
A
GCUC
A
GU
u
-
G
G K A
G A G C
R
Y
Y
W
S
A
Y
U
K
A W
R
A
U
C
W
R
RAKG
u
C
S C S -R G
U U
C
G
AWY
CYSKB
W
W
U
S
S
G
C
A
UA
Hofacker et al. (2002) Secondary Structure Prediction for Aligned RNA Sequences, J.Mol.Biol.
Paul Gardner RNA bioinformatics
Alignment Folding: RNAalifold
RNAalifold: energy + covariation.
βi,j =
1
N
N
α
Zα
i,j − Cov
Ci,j =
2
N(N − 1)
bα
i bα
j ,bβ
i bβ
j
DH(bα
i bα
j , bβ
i bβ
j )Πα
ij Πβ
ij
Hofacker et al. (2002) Secondary Structure Prediction for Aligned RNA Sequences, J.Mol.Biol.
Paul Gardner RNA bioinformatics
Covariation metrics
Lindgreen, Gardner & Krogh (2006) Measuring covariation in RNA alignments: physical realism improves
information measures. Bioinformatics.
Paul Gardner RNA bioinformatics
Rfam: annotation hierarchy
Types Clans Families Sequences
ribozyme
tRNA
CD-box_snoRNA
splicing
thermoregulator
leader
HACA-box_snoRNA
scaRNA
Intron
IRES
frameshift_element
sRNA
riboswitch
antisense
rRNA
miRNA
CRISPR
Cis-reg.
Gene
snRNA
snoRNA
Intron
Types
Paul Gardner RNA bioinformatics
Building an Rfam family
A structure from literature
An Rfam family: produced manually from publication figures
Paul Gardner RNA bioinformatics
An example Rfam entry
Paul Gardner RNA bioinformatics
Relevant reading
Reviews:
Eddy SR (2004) How do RNA folding algorithms work?
Nature Biotechnology.
Methods:
Hofacker et al. (2002) Secondary Structure Prediction for
Aligned RNA Sequences, J.Mol.Biol.
Paul Gardner RNA bioinformatics
The End
Paul Gardner RNA bioinformatics

Contenu connexe

Tendances

160902 Progress Report 進捗報告
160902 Progress Report 進捗報告160902 Progress Report 進捗報告
160902 Progress Report 進捗報告Yanbin Lin
 
CRISPR - gene-editing for everyone
CRISPR - gene-editing for everyoneCRISPR - gene-editing for everyone
CRISPR - gene-editing for everyoneCandy Smellie
 
2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...
2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...
2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...Spencer Bliven
 
Gene Editing for everyone
Gene Editing for everyoneGene Editing for everyone
Gene Editing for everyoneMike Jowett
 
Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Avoidance of stochastic RNA interactions can be harnessed to control protein ...Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Avoidance of stochastic RNA interactions can be harnessed to control protein ...Paul Gardner
 
University of Texas at Austin
University of Texas at AustinUniversity of Texas at Austin
University of Texas at Austinbutest
 
Gene 151_119 (1994) [SDM of dsDNA]
Gene 151_119 (1994) [SDM of dsDNA]Gene 151_119 (1994) [SDM of dsDNA]
Gene 151_119 (1994) [SDM of dsDNA]Michael Weiner
 
the application of CRISPR/Cas9 system in genome editing
the application of CRISPR/Cas9 system in genome editingthe application of CRISPR/Cas9 system in genome editing
the application of CRISPR/Cas9 system in genome editingArash zolnori
 
Characterization in Dvilp 7 gene
Characterization in Dvilp 7 geneCharacterization in Dvilp 7 gene
Characterization in Dvilp 7 geneHunter Kelley
 
Gene editing application for cancer therapeutics
Gene editing application for cancer therapeuticsGene editing application for cancer therapeutics
Gene editing application for cancer therapeuticsNur Farrah Dini
 
SBVRLDNACOMP:AN EFFECTIVE DNA SEQUENCE COMPRESSION ALGORITHM
 SBVRLDNACOMP:AN EFFECTIVE DNA SEQUENCE COMPRESSION ALGORITHM SBVRLDNACOMP:AN EFFECTIVE DNA SEQUENCE COMPRESSION ALGORITHM
SBVRLDNACOMP:AN EFFECTIVE DNA SEQUENCE COMPRESSION ALGORITHMijcsa
 
2009 11 16 UCR Comp Sci
2009 11 16 UCR Comp Sci2009 11 16 UCR Comp Sci
2009 11 16 UCR Comp SciJason Stajich
 
Bioo Scientific - Absolute Quantitation for RNA-Seq
Bioo Scientific - Absolute Quantitation for RNA-SeqBioo Scientific - Absolute Quantitation for RNA-Seq
Bioo Scientific - Absolute Quantitation for RNA-SeqBioo Scientific
 
CRISPR/Cas9 for the Correction of Duchenne Muscular Dystrophy
CRISPR/Cas9 for the Correction of Duchenne Muscular DystrophyCRISPR/Cas9 for the Correction of Duchenne Muscular Dystrophy
CRISPR/Cas9 for the Correction of Duchenne Muscular DystrophyNofiaFira
 
Genome Editing- ZNF vs TELEN
Genome Editing- ZNF vs TELENGenome Editing- ZNF vs TELEN
Genome Editing- ZNF vs TELENabhijeetanandha1
 

Tendances (20)

160902 Progress Report 進捗報告
160902 Progress Report 進捗報告160902 Progress Report 進捗報告
160902 Progress Report 進捗報告
 
CRISPR - gene-editing for everyone
CRISPR - gene-editing for everyoneCRISPR - gene-editing for everyone
CRISPR - gene-editing for everyone
 
2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...
2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...
2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...
 
Gene Editing for everyone
Gene Editing for everyoneGene Editing for everyone
Gene Editing for everyone
 
Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Avoidance of stochastic RNA interactions can be harnessed to control protein ...Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Avoidance of stochastic RNA interactions can be harnessed to control protein ...
 
emm201548a
emm201548aemm201548a
emm201548a
 
University of Texas at Austin
University of Texas at AustinUniversity of Texas at Austin
University of Texas at Austin
 
Gene 151_119 (1994) [SDM of dsDNA]
Gene 151_119 (1994) [SDM of dsDNA]Gene 151_119 (1994) [SDM of dsDNA]
Gene 151_119 (1994) [SDM of dsDNA]
 
the application of CRISPR/Cas9 system in genome editing
the application of CRISPR/Cas9 system in genome editingthe application of CRISPR/Cas9 system in genome editing
the application of CRISPR/Cas9 system in genome editing
 
Characterization in Dvilp 7 gene
Characterization in Dvilp 7 geneCharacterization in Dvilp 7 gene
Characterization in Dvilp 7 gene
 
Austin Neurology & Neurosciences
Austin Neurology & NeurosciencesAustin Neurology & Neurosciences
Austin Neurology & Neurosciences
 
PAINT Family PTHR13451-MUS81
PAINT Family PTHR13451-MUS81PAINT Family PTHR13451-MUS81
PAINT Family PTHR13451-MUS81
 
1.4 av
1.4 av1.4 av
1.4 av
 
Gene editing application for cancer therapeutics
Gene editing application for cancer therapeuticsGene editing application for cancer therapeutics
Gene editing application for cancer therapeutics
 
SBVRLDNACOMP:AN EFFECTIVE DNA SEQUENCE COMPRESSION ALGORITHM
 SBVRLDNACOMP:AN EFFECTIVE DNA SEQUENCE COMPRESSION ALGORITHM SBVRLDNACOMP:AN EFFECTIVE DNA SEQUENCE COMPRESSION ALGORITHM
SBVRLDNACOMP:AN EFFECTIVE DNA SEQUENCE COMPRESSION ALGORITHM
 
Zinc finger
Zinc fingerZinc finger
Zinc finger
 
2009 11 16 UCR Comp Sci
2009 11 16 UCR Comp Sci2009 11 16 UCR Comp Sci
2009 11 16 UCR Comp Sci
 
Bioo Scientific - Absolute Quantitation for RNA-Seq
Bioo Scientific - Absolute Quantitation for RNA-SeqBioo Scientific - Absolute Quantitation for RNA-Seq
Bioo Scientific - Absolute Quantitation for RNA-Seq
 
CRISPR/Cas9 for the Correction of Duchenne Muscular Dystrophy
CRISPR/Cas9 for the Correction of Duchenne Muscular DystrophyCRISPR/Cas9 for the Correction of Duchenne Muscular Dystrophy
CRISPR/Cas9 for the Correction of Duchenne Muscular Dystrophy
 
Genome Editing- ZNF vs TELEN
Genome Editing- ZNF vs TELENGenome Editing- ZNF vs TELEN
Genome Editing- ZNF vs TELEN
 

Similaire à BIOL335: RNA bioinformatics

Towards a systems-level understanding of RNA secondary structure and interact...
Towards a systems-level understanding of RNA secondary structure and interact...Towards a systems-level understanding of RNA secondary structure and interact...
Towards a systems-level understanding of RNA secondary structure and interact...Alexander Junge
 
RNA and Dendritic Granules
RNA and Dendritic GranulesRNA and Dendritic Granules
RNA and Dendritic Granulestoryblackwell
 
2011 Rna Course Part 1
2011 Rna Course Part 12011 Rna Course Part 1
2011 Rna Course Part 1ICGEB
 
Conservation of codon optimality
Conservation of codon optimalityConservation of codon optimality
Conservation of codon optimalityAlistair Martin
 
Thesis def
Thesis defThesis def
Thesis defJay Vyas
 
Computational studies of proteins and nucleic acid (Dissertation)
Computational studies of proteins and nucleic acid (Dissertation)Computational studies of proteins and nucleic acid (Dissertation)
Computational studies of proteins and nucleic acid (Dissertation)chrisltang
 
Gutell 041.nar.1994.22.03502
Gutell 041.nar.1994.22.03502Gutell 041.nar.1994.22.03502
Gutell 041.nar.1994.22.03502Robin Gutell
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsGenome Reference Consortium
 
Cytoscape: Integrating biological networks
Cytoscape: Integrating biological networksCytoscape: Integrating biological networks
Cytoscape: Integrating biological networksBITS
 
Gutell 112.j.phys.chem.b.2010.114.13497
Gutell 112.j.phys.chem.b.2010.114.13497Gutell 112.j.phys.chem.b.2010.114.13497
Gutell 112.j.phys.chem.b.2010.114.13497Robin Gutell
 
Random RNA interactions control protein expression in prokaryotes
Random RNA interactions control protein expression in prokaryotesRandom RNA interactions control protein expression in prokaryotes
Random RNA interactions control protein expression in prokaryotesPaul Gardner
 
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation OverviewPathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation OverviewPathema
 

Similaire à BIOL335: RNA bioinformatics (20)

Towards a systems-level understanding of RNA secondary structure and interact...
Towards a systems-level understanding of RNA secondary structure and interact...Towards a systems-level understanding of RNA secondary structure and interact...
Towards a systems-level understanding of RNA secondary structure and interact...
 
RNA and Dendritic Granules
RNA and Dendritic GranulesRNA and Dendritic Granules
RNA and Dendritic Granules
 
2011 Rna Course Part 1
2011 Rna Course Part 12011 Rna Course Part 1
2011 Rna Course Part 1
 
Conservation of codon optimality
Conservation of codon optimalityConservation of codon optimality
Conservation of codon optimality
 
Thesis def
Thesis defThesis def
Thesis def
 
proteome.pdf
proteome.pdfproteome.pdf
proteome.pdf
 
Molecular markers
Molecular markersMolecular markers
Molecular markers
 
Computational studies of proteins and nucleic acid (Dissertation)
Computational studies of proteins and nucleic acid (Dissertation)Computational studies of proteins and nucleic acid (Dissertation)
Computational studies of proteins and nucleic acid (Dissertation)
 
Genome Assembly 2018
Genome Assembly 2018Genome Assembly 2018
Genome Assembly 2018
 
Gutell 041.nar.1994.22.03502
Gutell 041.nar.1994.22.03502Gutell 041.nar.1994.22.03502
Gutell 041.nar.1994.22.03502
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long reads
 
proteome.pptx
proteome.pptxproteome.pptx
proteome.pptx
 
Cytoscape: Integrating biological networks
Cytoscape: Integrating biological networksCytoscape: Integrating biological networks
Cytoscape: Integrating biological networks
 
Gutell 112.j.phys.chem.b.2010.114.13497
Gutell 112.j.phys.chem.b.2010.114.13497Gutell 112.j.phys.chem.b.2010.114.13497
Gutell 112.j.phys.chem.b.2010.114.13497
 
CE-Symm jLBR talk
CE-Symm jLBR talkCE-Symm jLBR talk
CE-Symm jLBR talk
 
Genome Assembly
Genome AssemblyGenome Assembly
Genome Assembly
 
Cell 672
Cell 672Cell 672
Cell 672
 
Genome annotation 2013
Genome annotation 2013Genome annotation 2013
Genome annotation 2013
 
Random RNA interactions control protein expression in prokaryotes
Random RNA interactions control protein expression in prokaryotesRandom RNA interactions control protein expression in prokaryotes
Random RNA interactions control protein expression in prokaryotes
 
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation OverviewPathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
 

Plus de Paul Gardner

ppgardner-lecture07-genome-function.pdf
ppgardner-lecture07-genome-function.pdfppgardner-lecture07-genome-function.pdf
ppgardner-lecture07-genome-function.pdfPaul Gardner
 
ppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdfppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdfPaul Gardner
 
ppgardner-lecture05-alignment-comparativegenomics.pdf
ppgardner-lecture05-alignment-comparativegenomics.pdfppgardner-lecture05-alignment-comparativegenomics.pdf
ppgardner-lecture05-alignment-comparativegenomics.pdfPaul Gardner
 
ppgardner-lecture04-annotation-comparativegenomics.pdf
ppgardner-lecture04-annotation-comparativegenomics.pdfppgardner-lecture04-annotation-comparativegenomics.pdf
ppgardner-lecture04-annotation-comparativegenomics.pdfPaul Gardner
 
ppgardner-lecture03-genomesize-complexity.pdf
ppgardner-lecture03-genomesize-complexity.pdfppgardner-lecture03-genomesize-complexity.pdf
ppgardner-lecture03-genomesize-complexity.pdfPaul Gardner
 
Machine learning methods
Machine learning methodsMachine learning methods
Machine learning methodsPaul Gardner
 
Monte Carlo methods
Monte Carlo methodsMonte Carlo methods
Monte Carlo methodsPaul Gardner
 
The jackknife and bootstrap
The jackknife and bootstrapThe jackknife and bootstrap
The jackknife and bootstrapPaul Gardner
 
Contingency tables
Contingency tablesContingency tables
Contingency tablesPaul Gardner
 
Analysis of covariation and correlation
Analysis of covariation and correlationAnalysis of covariation and correlation
Analysis of covariation and correlationPaul Gardner
 
Analysis of two samples
Analysis of two samplesAnalysis of two samples
Analysis of two samplesPaul Gardner
 
Analysis of single samples
Analysis of single samplesAnalysis of single samples
Analysis of single samplesPaul Gardner
 
Centrality and spread
Centrality and spreadCentrality and spread
Centrality and spreadPaul Gardner
 
Fundamentals of statistical analysis
Fundamentals of statistical analysisFundamentals of statistical analysis
Fundamentals of statistical analysisPaul Gardner
 
A meta-analysis of computational biology benchmarks reveals predictors of pro...
A meta-analysis of computational biology benchmarks reveals predictors of pro...A meta-analysis of computational biology benchmarks reveals predictors of pro...
A meta-analysis of computational biology benchmarks reveals predictors of pro...Paul Gardner
 
Introduction to RNA-seq
Introduction to RNA-seqIntroduction to RNA-seq
Introduction to RNA-seqPaul Gardner
 

Plus de Paul Gardner (20)

ppgardner-lecture07-genome-function.pdf
ppgardner-lecture07-genome-function.pdfppgardner-lecture07-genome-function.pdf
ppgardner-lecture07-genome-function.pdf
 
ppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdfppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdf
 
ppgardner-lecture05-alignment-comparativegenomics.pdf
ppgardner-lecture05-alignment-comparativegenomics.pdfppgardner-lecture05-alignment-comparativegenomics.pdf
ppgardner-lecture05-alignment-comparativegenomics.pdf
 
ppgardner-lecture04-annotation-comparativegenomics.pdf
ppgardner-lecture04-annotation-comparativegenomics.pdfppgardner-lecture04-annotation-comparativegenomics.pdf
ppgardner-lecture04-annotation-comparativegenomics.pdf
 
ppgardner-lecture03-genomesize-complexity.pdf
ppgardner-lecture03-genomesize-complexity.pdfppgardner-lecture03-genomesize-complexity.pdf
ppgardner-lecture03-genomesize-complexity.pdf
 
Machine learning methods
Machine learning methodsMachine learning methods
Machine learning methods
 
Clustering
ClusteringClustering
Clustering
 
Monte Carlo methods
Monte Carlo methodsMonte Carlo methods
Monte Carlo methods
 
The jackknife and bootstrap
The jackknife and bootstrapThe jackknife and bootstrap
The jackknife and bootstrap
 
Contingency tables
Contingency tablesContingency tables
Contingency tables
 
Regression (II)
Regression (II)Regression (II)
Regression (II)
 
Regression (I)
Regression (I)Regression (I)
Regression (I)
 
Analysis of covariation and correlation
Analysis of covariation and correlationAnalysis of covariation and correlation
Analysis of covariation and correlation
 
Analysis of two samples
Analysis of two samplesAnalysis of two samples
Analysis of two samples
 
Analysis of single samples
Analysis of single samplesAnalysis of single samples
Analysis of single samples
 
Centrality and spread
Centrality and spreadCentrality and spread
Centrality and spread
 
Fundamentals of statistical analysis
Fundamentals of statistical analysisFundamentals of statistical analysis
Fundamentals of statistical analysis
 
A meta-analysis of computational biology benchmarks reveals predictors of pro...
A meta-analysis of computational biology benchmarks reveals predictors of pro...A meta-analysis of computational biology benchmarks reveals predictors of pro...
A meta-analysis of computational biology benchmarks reveals predictors of pro...
 
01 nc rna-intro
01 nc rna-intro01 nc rna-intro
01 nc rna-intro
 
Introduction to RNA-seq
Introduction to RNA-seqIntroduction to RNA-seq
Introduction to RNA-seq
 

Dernier

DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxGiDMOh
 
Speed Breeding in Vegetable Crops- innovative approach for present era of cro...
Speed Breeding in Vegetable Crops- innovative approach for present era of cro...Speed Breeding in Vegetable Crops- innovative approach for present era of cro...
Speed Breeding in Vegetable Crops- innovative approach for present era of cro...jana861314
 
Total Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of CannabinoidsTotal Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of CannabinoidsMarkus Roggen
 
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPRPirithiRaju
 
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer ZahanaEGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer ZahanaDr.Mahmoud Abbas
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Sérgio Sacani
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests GlycosidesNandakishor Bhaurao Deshmukh
 
Advances in AI-driven Image Recognition for Early Detection of Cancer
Advances in AI-driven Image Recognition for Early Detection of CancerAdvances in AI-driven Image Recognition for Early Detection of Cancer
Advances in AI-driven Image Recognition for Early Detection of CancerLuis Miguel Chong Chong
 
AICTE activity on Water Conservation spreading awareness
AICTE activity on Water Conservation spreading awarenessAICTE activity on Water Conservation spreading awareness
AICTE activity on Water Conservation spreading awareness1hk20is002
 
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsTimeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsDanielBaumann11
 
Efficient Fourier Pricing of Multi-Asset Options: Quasi-Monte Carlo & Domain ...
Efficient Fourier Pricing of Multi-Asset Options: Quasi-Monte Carlo & Domain ...Efficient Fourier Pricing of Multi-Asset Options: Quasi-Monte Carlo & Domain ...
Efficient Fourier Pricing of Multi-Asset Options: Quasi-Monte Carlo & Domain ...Chiheb Ben Hammouda
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxMedical College
 
Think Science: What Are Eclipses (101), by Craig Bobchin
Think Science: What Are Eclipses (101), by Craig BobchinThink Science: What Are Eclipses (101), by Craig Bobchin
Think Science: What Are Eclipses (101), by Craig BobchinNathan Cone
 
bonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlsbonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlshansessene
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsSérgio Sacani
 
3.-Acknowledgment-Dedication-Abstract.docx
3.-Acknowledgment-Dedication-Abstract.docx3.-Acknowledgment-Dedication-Abstract.docx
3.-Acknowledgment-Dedication-Abstract.docxUlahVanessaBasa
 
Introduction of Organ-On-A-Chip - Creative Biolabs
Introduction of Organ-On-A-Chip - Creative BiolabsIntroduction of Organ-On-A-Chip - Creative Biolabs
Introduction of Organ-On-A-Chip - Creative BiolabsCreative-Biolabs
 
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...Chayanika Das
 

Dernier (20)

DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptx
 
Speed Breeding in Vegetable Crops- innovative approach for present era of cro...
Speed Breeding in Vegetable Crops- innovative approach for present era of cro...Speed Breeding in Vegetable Crops- innovative approach for present era of cro...
Speed Breeding in Vegetable Crops- innovative approach for present era of cro...
 
Total Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of CannabinoidsTotal Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of Cannabinoids
 
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
 
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer ZahanaEGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
 
Interferons.pptx.
Interferons.pptx.Interferons.pptx.
Interferons.pptx.
 
Advances in AI-driven Image Recognition for Early Detection of Cancer
Advances in AI-driven Image Recognition for Early Detection of CancerAdvances in AI-driven Image Recognition for Early Detection of Cancer
Advances in AI-driven Image Recognition for Early Detection of Cancer
 
AICTE activity on Water Conservation spreading awareness
AICTE activity on Water Conservation spreading awarenessAICTE activity on Water Conservation spreading awareness
AICTE activity on Water Conservation spreading awareness
 
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsTimeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
 
Efficient Fourier Pricing of Multi-Asset Options: Quasi-Monte Carlo & Domain ...
Efficient Fourier Pricing of Multi-Asset Options: Quasi-Monte Carlo & Domain ...Efficient Fourier Pricing of Multi-Asset Options: Quasi-Monte Carlo & Domain ...
Efficient Fourier Pricing of Multi-Asset Options: Quasi-Monte Carlo & Domain ...
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptx
 
Think Science: What Are Eclipses (101), by Craig Bobchin
Think Science: What Are Eclipses (101), by Craig BobchinThink Science: What Are Eclipses (101), by Craig Bobchin
Think Science: What Are Eclipses (101), by Craig Bobchin
 
bonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlsbonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girls
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive stars
 
3.-Acknowledgment-Dedication-Abstract.docx
3.-Acknowledgment-Dedication-Abstract.docx3.-Acknowledgment-Dedication-Abstract.docx
3.-Acknowledgment-Dedication-Abstract.docx
 
Introduction of Organ-On-A-Chip - Creative Biolabs
Introduction of Organ-On-A-Chip - Creative BiolabsIntroduction of Organ-On-A-Chip - Creative Biolabs
Introduction of Organ-On-A-Chip - Creative Biolabs
 
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
 

BIOL335: RNA bioinformatics

  • 1. RNA bioinformatics Paul Gardner April 2, 2015 Paul Gardner RNA bioinformatics
  • 2. Main questions How can we predict RNA structure? Paul Gardner RNA bioinformatics
  • 3. Why do we care about RNA? RNA is important for translation and gene regulation 2 3 of the ribosome is RNA. Ribosomal function is preserved even after amino-acid residues are deleted from the active site! Current estimates indicate that the number of ncRNA genes is comparable to the number of protein coding genes. mDNA uDNA rDNA tDNA pre-mRNA mRNA nascent protein localised protein spliceosome ribosome tRNA + RNase P RNase MRP+snoRNP snoRNP SRP tmRNA transcription splicing translation transport RISC (miRNA) Paul Gardner RNA bioinformatics
  • 4. RNA: why is this stuff interesting? RNA world was an essential step to modern protein-DNA based life (using current reasonable models). Which came first, DNA or protein? RNA has catalytic potential (like protein), carries hereditary information (like DNA). Image by James W. Brown, www.mbio.ncsu.edu/JWB/soup.html Paul Gardner RNA bioinformatics
  • 5. RNA interference Image lifted from: http://en.wikipedia.org/wiki/RNA interference Paul Gardner RNA bioinformatics
  • 6. RNA: structure G C G G A U UU A GCUC AGD D G G G A G A G C G C C A GA C U G A A . A . C U G GAGG U C C U G U G T . C G A UC CACAG A A U U C G C A C CA Variable LoopAnticodon Loop T ΨC Loop 10 15 20 25 30 355 40 45 50 55 60 65 70 75 Anticodon Loop Acceptor Stem GCGGAUUUAGCUCAGDDGGGAGAGCGCCAGACUGAAYA.CUGGAGGUCCUGUGT.CGAUCCACAGAAUUCGCACCA5’ 3’ Secondary Structure Tertiary StructureB C Primary StructureA Acceptor Stem T ΨC Loop ΨΨ Ψ Ψ Y 65 60 55 40 10 20 15 5 70 75 25 30 35 45 50 D Loop 3’ 5’ 5’ 3’ D Loop Paul Gardner RNA bioinformatics
  • 7. RNA: base-pairing Canonical (Watson-Crick) base-pairs C · G, A · U. Non-canonical (Wobble) base-pair G · U Note: other non-canonical base-pairs do occur, but these are “rare” and generally re-defined as “tertiary” interactions. Central dogma of structural biology: structure is important for function. Images lifted from: http://en.wikipedia.org/wiki/Base pair Paul Gardner RNA bioinformatics
  • 8. RNA: base-pairing Images lifted from: http://eternawiki.org/wiki/index.php5/Base Pair Paul Gardner RNA bioinformatics
  • 9. RNA: base-pairing bpC C:G U:A U:G G:A C:A U:C A:A C:C G:G U:U Total WC 49.8% 14.4% 0.01% 1.2% 0.1% 0.5% - - - - 66.1% Wb 0.06% 0.06% 7.1% - 0.2% - 0.3% 0.5% 0.2% 0.9% 9.6% Other 0.8% 5.8% 1.5% 9.4% 2.3% 0.6% 2.6% 0.5% 0.7% 0.3% 24.3% Total 50.7% 20.3% 8.7% 10.6% 2.6% 1.0% 2.9% 1.0% 0.9% 1.3% 100.0% Just 71.3% of rRNA contacts are canonical or G:U wobble! Lee & Gutell (2004) Diversity of base-pair conformations and their occurrence in rRNA structure and RNA structural motifs J Mol Biol. Paul Gardner RNA bioinformatics
  • 10. RNA stacking Laurberg et al. (2008) Structural basis for translation termination on the 70S ribosome Nature. Image lifted from: http://rna.ucsc.edu/pdbrestraints/index.html Paul Gardner RNA bioinformatics
  • 11. RNA: number of structures AN is the number of possible secondary sequences of length N. AN ∼ 4N SN is the number of possible secondary structures of length N. S0 = S1 = 1 SN+1 = SN + N j=1 Sj−1SN−j+1 SN ∼ 1.8N Hofacker et al. (1998) Combinatorics of RNA Secondary Structures, Discrete Applied Mathematics. Paul Gardner RNA bioinformatics
  • 12. How can we make a secondary structure prediction algorithm? Maximize the number of base-pairs in a RNA sequence? Nussinov et al. (1978) Algorithms for loop matching, SIAM J. Appl. Math. Paul Gardner RNA bioinformatics
  • 13. Structure prediction: Nussinov Nussinov et al. (1978) Algorithms for loop matching, SIAM J. Appl. Math. Image from: Eddy SR (2004) How do RNA folding algorithms work? Nature Biotechnology. Paul Gardner RNA bioinformatics
  • 14. Structure prediction: Nussinov Maximize the number of base-pairs in RNA sequence. Seq = s1s2 · · · sn Ni,j = 0, ∀ j − i < 3. Ni,j = max    Ni+1,j−1 + ρ(i, j), i, j pair Ni+1,j , i unpaired Ni,j−1, j unpaired maxi<k<j [Ni,k + Nk+1,j ] bifurcation O(n3) in CPU, O(n2) in memory. ρ(i, j) = 1 if si and sj are complementary, otherwise ρ(i, j) = 0. N1,n = BPmax . Nussinov et al. (1978) Algorithms for loop matching, SIAM J. Appl. Math. Paul Gardner RNA bioinformatics
  • 15. Structure prediction: Nussinov There are a few problems with this approach: the solution to Nussinov is frequently not unique. For example, the 77 nucleotide long tRNAhis has 22 base-pairs in the phylogentic structure, there are 149, 126 structures with the maximal number of 26 base-pairs! The method ignores stacking interactions. Fontana (2002) Modelling ‘evo-devo’ with RNA. BioEssays. Paul Gardner RNA bioinformatics
  • 16. Structure prediction: Zuker Nearest neighbour model Modified Nussinov algorithm to find minimal free energy (most stable) structures A U C G U A G C S3 S2 S1 S1 S2 S3 GU L A C Free Energy = L + + + = −1.70 kcal/mol = 5.00 − 2.11 − 2.35 − 2.24 ∆Gstack = ∆H37,stack − T∆S37,stack ∆Gloop = −T∆S37,loop Tinoco et al. (1971) Estimation of secondary structure in RNA. Nature. Paul Gardner RNA bioinformatics
  • 17. Structure prediction: Zuker WXY Z CG GC AU UA GU UG CG -3.26 -2.36 -2.11 -2.08 -1.41 -2.11 GC -3.42 -3.26 -2.35 -2.24 -1.53 -2.51 AU -2.24 -2.08 -0.93 -1.10 -0.55 -1.36 UA -2.35 -2.11 -1.33 -0.93 -1.00 -1.27 GU -2.51 -2.11 -1.27 -1.36 +0.47 +1.29 UG -1.53 -1.41 -1.00 -0.55 +0.30 +0.47 Energies (∆G in kcals/mol) of 5 3 W X Y Z 3 5 stacked basepairs. Note that ∆G of 5 3 W X Y Z 3 5 stacks is the same as 5 3 Z Y X W 3 5 stacks. Mathews et al. (1999) Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. JMB. Paul Gardner RNA bioinformatics
  • 18. Suboptimal structures “There is an embarrassing abundance of structures having a free energy near that of the optimum.” (McCaskill 1990) −5 0 5 10 15 20 25 30 35 −22 −21.8 −21.6 −21.4 −21.2 −21 −20.8 −20.6 −20.4 −20.2 −20 dBP (Si ,Smfe ) ∆G(kcal/mol) G C G G A U U U A G CU C A G U U G G G A G A G C G C C A G A C U G A A G A U U U G G AG G U C C U G U G U U C G A U C C A C A G A A U U C G C A G C G G A UUU A GCUC AGU U G G G A G A G C G C C A G A C U G A A GA U U U G GAGG U C C U G U G U U C G AUC CACAG A A U U C G C A G C G G A U U UA G C UCAGUUG GGAG A G C G C C A G A C U G A AGAU U U G G A G G U C C U G U G U UC GAUC CA CA G A A U U C G C A Biological Suboptimal MFE Wuchty et al. (1999) Complete suboptimal folding of RNA and the stability of secondary structures, Biopolymers. Paul Gardner RNA bioinformatics
  • 19. Accuracy of MFE predictions Non-independant benchmarks: Walter et al. (1994) Mean sensitivity 63.6 Mathews et al. (1999) Mean sensitivity 72.9% Independant benchmarks: Doshi et al. (2004) Mean sensitivity 41% Dowell & Eddy (2004) Mean sensitivity 56% Mean PPV 48% Gardner & Giegerich (2004) Mean sensitivity 56% Mean PPV 46% Data-sets: tRNA, SSU rRNA, LSU rRNA, SRP, RNase P, tmRNA. Paul Gardner RNA bioinformatics
  • 20. Limitations of MFE predictions Energy parameters: estimated at constant salt concentrations and temperatures. Energy model: models of loop energies are extrapolated from relatively few experiments, no pseudoknots, ... Cellular environment: contains proteins, RNAs, DNAs, sugars, etc Post-transcriptional modifications: many functional RNAs have been covalently modified. Folding kinetics: RNAs fold along “pathways”, perhaps becoming trapped in sub-optimal conformations. Co-transcriptional folding: RNAs fold during transcription, the transcriptional apparatus occludes 3’ portions of the sequence. Transcription is jerky: transcriptional pausing can influence folding. Paul Gardner RNA bioinformatics
  • 21. Comparative sequence analysis Input: a set of sequences with the same biological function which are assumed to have approximately the same structure. Output: the common structural elements, aligned sequences and a phylogeny which best explains the observed data. 2 4 5 3 1 >1 GCAUCCAUGGCUGAAUGGUUAAAGCGCCCAACUCAUAAUUGGCGAACUCGCGGGUUCAAUUCCUGCUGGAUGCA >2 GCAUUGGUGGUUCAGUGGUAGAAUUCUCGCCUGCCACGCGGGAGGCCCGGGUUCGAUUCCCGGCCAAUGCA >3 UGGGCUAUGGUGUAAUUGGCAGCACGACUGAUUCUGGUUCAGUUAGUCUAGGUUCGAGUCCUGGUAGCCCAG >4 GAAGAUCGUCGUCUCCGGUGAGGCGGCUGGACUUCAAAUCCAGUUGGGGCCGCCAGCGGUCCCGGGCAGGUUCGACUCCUGUGAUCUUCCG >5 CUAAAUAUAUUUCAAUGGUUAGCAAAAUACGCUUGUGGUGCGUUAAAUCUAAGUUCGAUUCUUAGUAUUUACC ** * 1 GCAUCCAUGGCUGAAU-GGUU-AAAGCGCCCAACUCAUAAUUGGCGAA-- 2 GCAUUGGUGGUUCAGU-GGU--AGAAUUCUCGCCUGCCACGCGG-GAG-- 3 UGGGCUAUGGUGUAAUUGGC--AGCACGACUGAUUCUGGUUCAG-UUA-- 4 GAAGAUCGUCGUCUCC-GGUG-AGGCGGCUGGACUUCAAAUCCA-GU-UG 5 CUAAAUAUAUUUCAAU-GGUUAGCAAAAUACGCUUGUGGUGCGU-UAA-- **** * ** 1 ------------------CUCGCGGGUUCAAUUCCUGCUGGAUGC-A 2 ------------------G-CCCGGGUUCGAUUCCCGGCCAAUGC-A 3 ------------------G-UCUAGGUUCGAGUCCUGGUAGCCCA-G 4 GGGCCGCCAGCGGUCCCG--GGCAGGUUCGACUCCUGUGAUCUUCCG 5 ------------------A-UCUAAGUUCGAUUCUUAGUAUUUAC-C S M A D M Y MUR SYUC A MY- G G Y u a A V M M M R M H C R MY U S H V R H K C V R c K W A - - - - - c c - c c a - c - - - c c c -V-YS Y R R G U U C R AY U CCYRS Y M D M Y V M c V Paul Gardner RNA bioinformatics
  • 22. Comparative sequence analysis Evolution of RNA sequences Base-pairs that covary have strong evolutionary support U A C A A G A G U G C G U U U A A G U AY R Y A A S M G U S C G Y K K A A G Y RY A U A A N A D U G C G U U G A A G U R c b (((..(((....)))..))) (((..(((....)))..))) (((..(((....)))..))) (((..(((....)))..))) UACAAGAGUGCGCUUAAGUA UGCAAAAGUCCGUUUAAGCA UAUAACCUUUCGAGGAAAUA CAUAAUAAUGCGUUGAAGUG a MIS YAUAANADUGCGUUGAAGURAncestral UACAAGAGUGCGUUUAAGUA YRYAASMGUSCGYKKAAGYR consensus consensusAncestral MIS G U A U G C U G C G U A fast fast slow Paul Gardner RNA bioinformatics
  • 23. Alignment Folding: RNAalifold Generate an alignment (e.g. with ClustalW) Find a consensus structure that is both energetically stable in all sequences and has covariation support G C G G A A U U A G C U C A G U U _ G G G A G A G C G C C A G A C U G A A A A U C U G G A G G U C C C C _ G G U U C G A A U C C C G G A A U C C G C A G C G G A A U U A G C U C A G U U _ G G G A G A G C G C C A G A C U G A A A A U C U G G A G G U C C C C _ G G U U C G A A U C C C G G A A U C C G C A GCGGAAUUAGCUCAGUU_GGGAGAGCGCCAGACUGAAAAUCUGGAGGUCCCC_GGUUCGAAUCCCGGAAUCCGCA GCGGAAUUAGCUCAGUU_GGGAGAGCGCCAGACUGAAAAUCUGGAGGUCCCC_GGUUCGAAUCCCGGAAUCCGCA G C B K M W WU A GCUC A GU u - G G K A G A G C R Y Y W S A Y U K A W R A U C W R RAKG u C S C S -R G U U C G AWY CYSKB W W U S S G C A UA Hofacker et al. (2002) Secondary Structure Prediction for Aligned RNA Sequences, J.Mol.Biol. Paul Gardner RNA bioinformatics
  • 24. Alignment Folding: RNAalifold RNAalifold: energy + covariation. βi,j = 1 N N α Zα i,j − Cov Ci,j = 2 N(N − 1) bα i bα j ,bβ i bβ j DH(bα i bα j , bβ i bβ j )Πα ij Πβ ij Hofacker et al. (2002) Secondary Structure Prediction for Aligned RNA Sequences, J.Mol.Biol. Paul Gardner RNA bioinformatics
  • 25. Covariation metrics Lindgreen, Gardner & Krogh (2006) Measuring covariation in RNA alignments: physical realism improves information measures. Bioinformatics. Paul Gardner RNA bioinformatics
  • 26. Rfam: annotation hierarchy Types Clans Families Sequences ribozyme tRNA CD-box_snoRNA splicing thermoregulator leader HACA-box_snoRNA scaRNA Intron IRES frameshift_element sRNA riboswitch antisense rRNA miRNA CRISPR Cis-reg. Gene snRNA snoRNA Intron Types Paul Gardner RNA bioinformatics
  • 27. Building an Rfam family A structure from literature An Rfam family: produced manually from publication figures Paul Gardner RNA bioinformatics
  • 28. An example Rfam entry Paul Gardner RNA bioinformatics
  • 29. Relevant reading Reviews: Eddy SR (2004) How do RNA folding algorithms work? Nature Biotechnology. Methods: Hofacker et al. (2002) Secondary Structure Prediction for Aligned RNA Sequences, J.Mol.Biol. Paul Gardner RNA bioinformatics
  • 30. The End Paul Gardner RNA bioinformatics