SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
Homology Search
Paul Gardner
March 24, 2015
Paul Gardner Homology Search
News & Views reminder (20% of your course grade, due
March 26, Reviewed April 2 (5/20), Revisions April 28
(15/20))
Meredith et al. (2014) Evidence for a single loss of
mineralized teeth in the common avian ancestor. Science
Nunez et al. (2015) Integrase-mediated spacer acquisition
during CRISPR-Cas adaptive immunity. Nature
Paul Gardner Homology Search
Homology search
In a huge collection of biological
sequences how can you locate
similar sequences?
by using heuristic, super fast,
sequence alignment methods
Paul Gardner Homology Search
BLAST
Paul Gardner Homology Search
BLAST
Identify all ’hits’ of at least W long
Find any hits on the same diagonal of an alignment matrix
Trigger a full alignment in that region
Basic idea: identify near-identical sub-sequences first → align any
hits in full
Paul Gardner Homology Search
What does that E-value (Expect) mean?
>gb|CP001191.1| Rhizobium leguminosarum bv. trifolii WSM2304, complete genome
Length=4537948
Features in this part of subject sequence:
cold-shock DNA-binding domain protein
Score = 57.2 bits (62), Expect = 2e-05
Identities = 78/106 (74%), Gaps = 6/106 (6%)
Strand=Plus/Plus
Query 1 CTTCGTCAGATTTCCTCTCAATATCGATCATACCGGACTGATATTCGTCCGG----GAAC
|| |||||||| ||||||||| |||||| | | | || |||| |||| ||||
Sbjct 828507 CTCCGTCAGATATCCTCTCAACATCGATACGGCTTGTCGGACATTCTTCCGCAGGCGAAC
Query 57 TCTAGCGATTGAAA-GGAAATCGTTATGAACTCAGGCACCGTAAAG
| | || |||||| ||| ||||||||||| |||||| ||| |||
Sbjct 828567 ACAA-CGGTTGAAAAGGAGATCGTTATGAATTCAGGCGTCGTCAAG
Paul Gardner Homology Search
How can we evaluate the significance of a score?
Note that a bit-score of 57.2 by itself is not that useful.
It depends on the sequence & database size & composition.
To counter this we can compute an Expect-value (E-value).
This is the expected number of hits with the observed score for
the given query and database sizes.
P-values can also be used
0 100 200 300 400 500 600 700
0
2000
4000
6000
8000
10000
Separating true from false hits
score (bits)
Num.matches
Random sequences/Negative controls
True homologs/Positive controls
Threshold
False negatives
True positives
False positives
True negatives
Paul Gardner Homology Search
How can we evaluate the significance of a score?
0 100 200 300 400 500 600 700
0
2000
4000
6000
8000
10000
Separating true from false hits
score (bits)
Num.matches
Random sequences/Negative controls
True homologs/Positive controls
Threshold
False negatives
True positives
False positives
True negatives
E = κMN2−λx
E: E-value
M&N: query &
database size
κ&λ: fitting
parameters
Paul Gardner Homology Search
BLAST is not the only, or best tool for the job!
Paul Gardner Homology Search
Profile-based homology search
Krogh, A. et al. (1994) Hidden Markov models in computational biology. Applications to protein modeling. J Mol
Biol.
Image provided by Eric Nawrocki.
Paul Gardner Homology Search
Profile-based homology search – scoring sequences
Image provided by Eric Nawrocki.
Paul Gardner Homology Search
Profile HMM are slightly more complicated
A tree-weighting scheme takes care of unbalanced
alignments
Dirichlet-mixture priors are used to incorporate information
about amino-acid biochemistry
Effective sequence number is used to down-weight priors
when many sequences are available
Transition probabilities to Insert & Delete states are estimated
from the alignment
Paul Gardner Homology Search
Why not just use BLAST?
ACCURACY!
Every benchmark of homology search tools has shown that
profile methods are more accurate than single-sequence
methods.
Eddy (2011) Accelerated Profile HMM Searches. PLoS
Computational Biology.
Paul Gardner Homology Search
Why not just use BLAST?
SPEED! To search a single query vs a database of all proteins:
BLAST: searches 42 million UniProt sequences
HMMER: searches 15,000 Pfam profiles
The search space is ∼ 3, 000x smaller for profiles
Save Planet Earth, use HMMER3
Eddy (2011) Accelerated Profile HMM Searches. PLoS
Computational Biology.
Paul Gardner Homology Search
Pfam
What is a Pfam-A Entry?
hmmsearch
hmmbuild
hmmalign
SEED
HMM
OUTOUT
ALIGNDESC
Slide borrowed from Rob Finn.
Paul Gardner Homology Search
But, what about RNA?
5’
3’
0
Sequence conservation
1
A
G
U
K G
C
U
C
A
U
U
CA
C
C
K
W
Y U
U
A
U
G
W
YR
G
YCC
C
g
C
Y
V
U
U
H R G C G
G
A
A
K
A
Y
G
YG
C
U
W
C
A
U
A
A R
M
Y
A
Y
C
G
A
A
U
G
AY
G
C M
H
A
A
G
M
M
WG
G
U
G
C
C
U R
Y
C
G
U
C
C A M
C
U
W
A
a
C
Y
G
A
U
A
W Y
R
K
G
U
G
MRU
R
C
R
C
W
U
U
A
U
C
AA
V
C
A
Y
C
G
G
R
C
GA
M
A
C
G
UY
G
A G
U
K
A
G
G
C
A
C
CGC
C
U
W
5’
3’
0
Sequence conservation
1
A
A
Y
A
A
A
A
U
A
A
U
U
U
A
C
AUUCCA AG
G
A
C
C
G
G
UA
U
U
A
U
U
GU A
G
G
G
G
A
U
U
U
GU
G
AC
U
U
Y C
A
A
G
G
C
A
A
Y
G
U
C
C
U
C
U
C
U
A
C
AA
C
C
G
A
G
U
U
C R
A
G
A
A
U
A
A
R
Y
A
C
M
A
A
YG
G
C
U
C U
U
U
U
U
G
UU
A
U
U
C
G
A
A
A
G C
U
U
A
C
A
A
G
DU
V
Y
R
G
Y
R
U
M
U
U
C
U
R
U
A
U
R
C
U
C
W
C
Y
Uc
a
M
U
Y
A C
U
U
U
C
M
A
G
U
AC
U
U
C
A
C
A
C G
G
G
C
CWRACAK
M
U
5’ 3’
0
Sequence conservation
1
U
V
D
WHAUGA
U
G
A
G
Y
U
C
M
A
C
U
U
C
W
U
u
G
G
U
C
C
G
U
G U U U C U G A g a R
M
C
Y
M
R
U
G
A
U
M
U
B
W
R
U
G
a
S
A
A
a
G
U
UCUGAY
U
H
M
Paul Gardner Homology Search
Covariance models
Nawrocki & Eddy (2007) Query-Dependent Banding (QDB) for Faster RNA Similarity Searches. PLOS
computational biology.
Paul Gardner Homology Search
Benchmark
Freyhult, Bollback & Gardner (2007) Exploring genomic dark matter: A critical assessment of the performance of
homology search methods on noncoding RNA. Genome Research.
Paul Gardner Homology Search
Rfam
Paul Gardner Homology Search
Relevant reading
Reviews:
Eddy SR (2004) What is a hidden Markov model? Nature
Biotechnology.
Methods:
Altschul SF et al. (1997) Gapped BLAST and PSI-BLAST: a
new generation of protein database search programs. Nucleic
acids research.
Eddy (2011) Accelerated Profile HMM Searches. PLoS
Computational Biology.
Paul Gardner Homology Search
The End
Paul Gardner Homology Search

Contenu connexe

Tendances

Tendances (20)

Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
Sequence analysis - Bioinformatics
Sequence analysis - BioinformaticsSequence analysis - Bioinformatics
Sequence analysis - Bioinformatics
 
EMBL
EMBLEMBL
EMBL
 
Sequence database
Sequence databaseSequence database
Sequence database
 
Introduction to sequence alignment partii
Introduction to sequence alignment partiiIntroduction to sequence alignment partii
Introduction to sequence alignment partii
 
Blast and fasta
Blast and fastaBlast and fasta
Blast and fasta
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahu
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Swiss prot database
Swiss prot databaseSwiss prot database
Swiss prot database
 
Genomic databases
Genomic databasesGenomic databases
Genomic databases
 
blast bioinformatics
blast bioinformaticsblast bioinformatics
blast bioinformatics
 
Cath
CathCath
Cath
 
GENOMICS AND BIOINFORMATICS
GENOMICS AND BIOINFORMATICSGENOMICS AND BIOINFORMATICS
GENOMICS AND BIOINFORMATICS
 
BLAST
BLASTBLAST
BLAST
 
EMBL-EBI
EMBL-EBIEMBL-EBI
EMBL-EBI
 
Nucleic Acid Sequence databases
Nucleic Acid Sequence databasesNucleic Acid Sequence databases
Nucleic Acid Sequence databases
 
Publicly available tools and open resources in Bioinformatics
Publicly available  tools and open resources in BioinformaticsPublicly available  tools and open resources in Bioinformatics
Publicly available tools and open resources in Bioinformatics
 
Genome annotation 2013
Genome annotation 2013Genome annotation 2013
Genome annotation 2013
 
Data retrieval
Data retrievalData retrieval
Data retrieval
 
Sequence homology search and multiple sequence alignment(1)
Sequence homology search and multiple sequence alignment(1)Sequence homology search and multiple sequence alignment(1)
Sequence homology search and multiple sequence alignment(1)
 

Similaire à BIOL335: Homology search

ppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdfppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdfPaul Gardner
 
BIOL335: Functional genomics
BIOL335: Functional genomicsBIOL335: Functional genomics
BIOL335: Functional genomicsPaul Gardner
 
Kyle Jensen MIT Ph.D. Thesis Defense
Kyle Jensen MIT Ph.D. Thesis DefenseKyle Jensen MIT Ph.D. Thesis Defense
Kyle Jensen MIT Ph.D. Thesis DefenseKyle Jensen
 
Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Who...
Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Who...Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Who...
Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Who...Human Variome Project
 
2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...
2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...
2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...Spencer Bliven
 
SMBE 2015: Expression STRs
SMBE 2015: Expression STRsSMBE 2015: Expression STRs
SMBE 2015: Expression STRsYaniv Erlich
 
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationVisual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationNils Gehlenborg
 
Prediction of protein function
Prediction of protein functionPrediction of protein function
Prediction of protein functionLars Juhl Jensen
 
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Golden Helix Inc
 
http://lectures.gersteinlab.org/ppt/Gencode-winter08-20090121-pseudogenes/Gen...
http://lectures.gersteinlab.org/ppt/Gencode-winter08-20090121-pseudogenes/Gen...http://lectures.gersteinlab.org/ppt/Gencode-winter08-20090121-pseudogenes/Gen...
http://lectures.gersteinlab.org/ppt/Gencode-winter08-20090121-pseudogenes/Gen...Mark Gerstein
 
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...Fatma Sayed Ibrahim
 
BIOL335: Genetic selection
BIOL335: Genetic selectionBIOL335: Genetic selection
BIOL335: Genetic selectionPaul Gardner
 
Scalable Genome Analysis with ADAM
Scalable Genome Analysis with ADAMScalable Genome Analysis with ADAM
Scalable Genome Analysis with ADAMfnothaft
 
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation OverviewPathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation OverviewPathema
 
Vanderwall cheminformatics Drexel Part 1
Vanderwall cheminformatics Drexel Part 1Vanderwall cheminformatics Drexel Part 1
Vanderwall cheminformatics Drexel Part 1Jean-Claude Bradley
 
Day2 145pm Crawford
Day2 145pm CrawfordDay2 145pm Crawford
Day2 145pm CrawfordSean Paul
 

Similaire à BIOL335: Homology search (20)

ppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdfppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdf
 
BIOL335: Functional genomics
BIOL335: Functional genomicsBIOL335: Functional genomics
BIOL335: Functional genomics
 
Kyle Jensen MIT Ph.D. Thesis Defense
Kyle Jensen MIT Ph.D. Thesis DefenseKyle Jensen MIT Ph.D. Thesis Defense
Kyle Jensen MIT Ph.D. Thesis Defense
 
Tair workshop stanford2017
Tair workshop stanford2017Tair workshop stanford2017
Tair workshop stanford2017
 
Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Who...
Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Who...Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Who...
Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Who...
 
2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...
2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...
2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...
 
Mikel egana itbam_2010_ogo_system
Mikel egana itbam_2010_ogo_systemMikel egana itbam_2010_ogo_system
Mikel egana itbam_2010_ogo_system
 
SMBE 2015: Expression STRs
SMBE 2015: Expression STRsSMBE 2015: Expression STRs
SMBE 2015: Expression STRs
 
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationVisual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
 
Prediction of protein function
Prediction of protein functionPrediction of protein function
Prediction of protein function
 
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
 
2015 osu-metagenome
2015 osu-metagenome2015 osu-metagenome
2015 osu-metagenome
 
http://lectures.gersteinlab.org/ppt/Gencode-winter08-20090121-pseudogenes/Gen...
http://lectures.gersteinlab.org/ppt/Gencode-winter08-20090121-pseudogenes/Gen...http://lectures.gersteinlab.org/ppt/Gencode-winter08-20090121-pseudogenes/Gen...
http://lectures.gersteinlab.org/ppt/Gencode-winter08-20090121-pseudogenes/Gen...
 
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...
 
BIOL335: Genetic selection
BIOL335: Genetic selectionBIOL335: Genetic selection
BIOL335: Genetic selection
 
Scalable Genome Analysis with ADAM
Scalable Genome Analysis with ADAMScalable Genome Analysis with ADAM
Scalable Genome Analysis with ADAM
 
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation OverviewPathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
 
Blast 2013 1
Blast 2013 1Blast 2013 1
Blast 2013 1
 
Vanderwall cheminformatics Drexel Part 1
Vanderwall cheminformatics Drexel Part 1Vanderwall cheminformatics Drexel Part 1
Vanderwall cheminformatics Drexel Part 1
 
Day2 145pm Crawford
Day2 145pm CrawfordDay2 145pm Crawford
Day2 145pm Crawford
 

Plus de Paul Gardner

ppgardner-lecture07-genome-function.pdf
ppgardner-lecture07-genome-function.pdfppgardner-lecture07-genome-function.pdf
ppgardner-lecture07-genome-function.pdfPaul Gardner
 
ppgardner-lecture05-alignment-comparativegenomics.pdf
ppgardner-lecture05-alignment-comparativegenomics.pdfppgardner-lecture05-alignment-comparativegenomics.pdf
ppgardner-lecture05-alignment-comparativegenomics.pdfPaul Gardner
 
ppgardner-lecture04-annotation-comparativegenomics.pdf
ppgardner-lecture04-annotation-comparativegenomics.pdfppgardner-lecture04-annotation-comparativegenomics.pdf
ppgardner-lecture04-annotation-comparativegenomics.pdfPaul Gardner
 
ppgardner-lecture03-genomesize-complexity.pdf
ppgardner-lecture03-genomesize-complexity.pdfppgardner-lecture03-genomesize-complexity.pdf
ppgardner-lecture03-genomesize-complexity.pdfPaul Gardner
 
Does RNA avoidance dictate protein expression level?
Does RNA avoidance dictate protein expression level?Does RNA avoidance dictate protein expression level?
Does RNA avoidance dictate protein expression level?Paul Gardner
 
Machine learning methods
Machine learning methodsMachine learning methods
Machine learning methodsPaul Gardner
 
Monte Carlo methods
Monte Carlo methodsMonte Carlo methods
Monte Carlo methodsPaul Gardner
 
The jackknife and bootstrap
The jackknife and bootstrapThe jackknife and bootstrap
The jackknife and bootstrapPaul Gardner
 
Contingency tables
Contingency tablesContingency tables
Contingency tablesPaul Gardner
 
Analysis of covariation and correlation
Analysis of covariation and correlationAnalysis of covariation and correlation
Analysis of covariation and correlationPaul Gardner
 
Analysis of two samples
Analysis of two samplesAnalysis of two samples
Analysis of two samplesPaul Gardner
 
Analysis of single samples
Analysis of single samplesAnalysis of single samples
Analysis of single samplesPaul Gardner
 
Centrality and spread
Centrality and spreadCentrality and spread
Centrality and spreadPaul Gardner
 
Fundamentals of statistical analysis
Fundamentals of statistical analysisFundamentals of statistical analysis
Fundamentals of statistical analysisPaul Gardner
 
Random RNA interactions control protein expression in prokaryotes
Random RNA interactions control protein expression in prokaryotesRandom RNA interactions control protein expression in prokaryotes
Random RNA interactions control protein expression in prokaryotesPaul Gardner
 
Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Avoidance of stochastic RNA interactions can be harnessed to control protein ...Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Avoidance of stochastic RNA interactions can be harnessed to control protein ...Paul Gardner
 
A meta-analysis of computational biology benchmarks reveals predictors of pro...
A meta-analysis of computational biology benchmarks reveals predictors of pro...A meta-analysis of computational biology benchmarks reveals predictors of pro...
A meta-analysis of computational biology benchmarks reveals predictors of pro...Paul Gardner
 

Plus de Paul Gardner (20)

ppgardner-lecture07-genome-function.pdf
ppgardner-lecture07-genome-function.pdfppgardner-lecture07-genome-function.pdf
ppgardner-lecture07-genome-function.pdf
 
ppgardner-lecture05-alignment-comparativegenomics.pdf
ppgardner-lecture05-alignment-comparativegenomics.pdfppgardner-lecture05-alignment-comparativegenomics.pdf
ppgardner-lecture05-alignment-comparativegenomics.pdf
 
ppgardner-lecture04-annotation-comparativegenomics.pdf
ppgardner-lecture04-annotation-comparativegenomics.pdfppgardner-lecture04-annotation-comparativegenomics.pdf
ppgardner-lecture04-annotation-comparativegenomics.pdf
 
ppgardner-lecture03-genomesize-complexity.pdf
ppgardner-lecture03-genomesize-complexity.pdfppgardner-lecture03-genomesize-complexity.pdf
ppgardner-lecture03-genomesize-complexity.pdf
 
Does RNA avoidance dictate protein expression level?
Does RNA avoidance dictate protein expression level?Does RNA avoidance dictate protein expression level?
Does RNA avoidance dictate protein expression level?
 
Machine learning methods
Machine learning methodsMachine learning methods
Machine learning methods
 
Clustering
ClusteringClustering
Clustering
 
Monte Carlo methods
Monte Carlo methodsMonte Carlo methods
Monte Carlo methods
 
The jackknife and bootstrap
The jackknife and bootstrapThe jackknife and bootstrap
The jackknife and bootstrap
 
Contingency tables
Contingency tablesContingency tables
Contingency tables
 
Regression (II)
Regression (II)Regression (II)
Regression (II)
 
Regression (I)
Regression (I)Regression (I)
Regression (I)
 
Analysis of covariation and correlation
Analysis of covariation and correlationAnalysis of covariation and correlation
Analysis of covariation and correlation
 
Analysis of two samples
Analysis of two samplesAnalysis of two samples
Analysis of two samples
 
Analysis of single samples
Analysis of single samplesAnalysis of single samples
Analysis of single samples
 
Centrality and spread
Centrality and spreadCentrality and spread
Centrality and spread
 
Fundamentals of statistical analysis
Fundamentals of statistical analysisFundamentals of statistical analysis
Fundamentals of statistical analysis
 
Random RNA interactions control protein expression in prokaryotes
Random RNA interactions control protein expression in prokaryotesRandom RNA interactions control protein expression in prokaryotes
Random RNA interactions control protein expression in prokaryotes
 
Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Avoidance of stochastic RNA interactions can be harnessed to control protein ...Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Avoidance of stochastic RNA interactions can be harnessed to control protein ...
 
A meta-analysis of computational biology benchmarks reveals predictors of pro...
A meta-analysis of computational biology benchmarks reveals predictors of pro...A meta-analysis of computational biology benchmarks reveals predictors of pro...
A meta-analysis of computational biology benchmarks reveals predictors of pro...
 

Dernier

Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxThermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxuniversity
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomyDrAnita Sharma
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsSérgio Sacani
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXDole Philippines School
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 

Dernier (20)

Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxThermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomy
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive stars
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 

BIOL335: Homology search

  • 1. Homology Search Paul Gardner March 24, 2015 Paul Gardner Homology Search
  • 2. News & Views reminder (20% of your course grade, due March 26, Reviewed April 2 (5/20), Revisions April 28 (15/20)) Meredith et al. (2014) Evidence for a single loss of mineralized teeth in the common avian ancestor. Science Nunez et al. (2015) Integrase-mediated spacer acquisition during CRISPR-Cas adaptive immunity. Nature Paul Gardner Homology Search
  • 3. Homology search In a huge collection of biological sequences how can you locate similar sequences? by using heuristic, super fast, sequence alignment methods Paul Gardner Homology Search
  • 5. BLAST Identify all ’hits’ of at least W long Find any hits on the same diagonal of an alignment matrix Trigger a full alignment in that region Basic idea: identify near-identical sub-sequences first → align any hits in full Paul Gardner Homology Search
  • 6. What does that E-value (Expect) mean? >gb|CP001191.1| Rhizobium leguminosarum bv. trifolii WSM2304, complete genome Length=4537948 Features in this part of subject sequence: cold-shock DNA-binding domain protein Score = 57.2 bits (62), Expect = 2e-05 Identities = 78/106 (74%), Gaps = 6/106 (6%) Strand=Plus/Plus Query 1 CTTCGTCAGATTTCCTCTCAATATCGATCATACCGGACTGATATTCGTCCGG----GAAC || |||||||| ||||||||| |||||| | | | || |||| |||| |||| Sbjct 828507 CTCCGTCAGATATCCTCTCAACATCGATACGGCTTGTCGGACATTCTTCCGCAGGCGAAC Query 57 TCTAGCGATTGAAA-GGAAATCGTTATGAACTCAGGCACCGTAAAG | | || |||||| ||| ||||||||||| |||||| ||| ||| Sbjct 828567 ACAA-CGGTTGAAAAGGAGATCGTTATGAATTCAGGCGTCGTCAAG Paul Gardner Homology Search
  • 7. How can we evaluate the significance of a score? Note that a bit-score of 57.2 by itself is not that useful. It depends on the sequence & database size & composition. To counter this we can compute an Expect-value (E-value). This is the expected number of hits with the observed score for the given query and database sizes. P-values can also be used 0 100 200 300 400 500 600 700 0 2000 4000 6000 8000 10000 Separating true from false hits score (bits) Num.matches Random sequences/Negative controls True homologs/Positive controls Threshold False negatives True positives False positives True negatives Paul Gardner Homology Search
  • 8. How can we evaluate the significance of a score? 0 100 200 300 400 500 600 700 0 2000 4000 6000 8000 10000 Separating true from false hits score (bits) Num.matches Random sequences/Negative controls True homologs/Positive controls Threshold False negatives True positives False positives True negatives E = κMN2−λx E: E-value M&N: query & database size κ&λ: fitting parameters Paul Gardner Homology Search
  • 9. BLAST is not the only, or best tool for the job! Paul Gardner Homology Search
  • 10. Profile-based homology search Krogh, A. et al. (1994) Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol. Image provided by Eric Nawrocki. Paul Gardner Homology Search
  • 11. Profile-based homology search – scoring sequences Image provided by Eric Nawrocki. Paul Gardner Homology Search
  • 12. Profile HMM are slightly more complicated A tree-weighting scheme takes care of unbalanced alignments Dirichlet-mixture priors are used to incorporate information about amino-acid biochemistry Effective sequence number is used to down-weight priors when many sequences are available Transition probabilities to Insert & Delete states are estimated from the alignment Paul Gardner Homology Search
  • 13. Why not just use BLAST? ACCURACY! Every benchmark of homology search tools has shown that profile methods are more accurate than single-sequence methods. Eddy (2011) Accelerated Profile HMM Searches. PLoS Computational Biology. Paul Gardner Homology Search
  • 14. Why not just use BLAST? SPEED! To search a single query vs a database of all proteins: BLAST: searches 42 million UniProt sequences HMMER: searches 15,000 Pfam profiles The search space is ∼ 3, 000x smaller for profiles Save Planet Earth, use HMMER3 Eddy (2011) Accelerated Profile HMM Searches. PLoS Computational Biology. Paul Gardner Homology Search
  • 15. Pfam What is a Pfam-A Entry? hmmsearch hmmbuild hmmalign SEED HMM OUTOUT ALIGNDESC Slide borrowed from Rob Finn. Paul Gardner Homology Search
  • 16. But, what about RNA? 5’ 3’ 0 Sequence conservation 1 A G U K G C U C A U U CA C C K W Y U U A U G W YR G YCC C g C Y V U U H R G C G G A A K A Y G YG C U W C A U A A R M Y A Y C G A A U G AY G C M H A A G M M WG G U G C C U R Y C G U C C A M C U W A a C Y G A U A W Y R K G U G MRU R C R C W U U A U C AA V C A Y C G G R C GA M A C G UY G A G U K A G G C A C CGC C U W 5’ 3’ 0 Sequence conservation 1 A A Y A A A A U A A U U U A C AUUCCA AG G A C C G G UA U U A U U GU A G G G G A U U U GU G AC U U Y C A A G G C A A Y G U C C U C U C U A C AA C C G A G U U C R A G A A U A A R Y A C M A A YG G C U C U U U U U G UU A U U C G A A A G C U U A C A A G DU V Y R G Y R U M U U C U R U A U R C U C W C Y Uc a M U Y A C U U U C M A G U AC U U C A C A C G G G C CWRACAK M U 5’ 3’ 0 Sequence conservation 1 U V D WHAUGA U G A G Y U C M A C U U C W U u G G U C C G U G U U U C U G A g a R M C Y M R U G A U M U B W R U G a S A A a G U UCUGAY U H M Paul Gardner Homology Search
  • 17. Covariance models Nawrocki & Eddy (2007) Query-Dependent Banding (QDB) for Faster RNA Similarity Searches. PLOS computational biology. Paul Gardner Homology Search
  • 18. Benchmark Freyhult, Bollback & Gardner (2007) Exploring genomic dark matter: A critical assessment of the performance of homology search methods on noncoding RNA. Genome Research. Paul Gardner Homology Search
  • 20. Relevant reading Reviews: Eddy SR (2004) What is a hidden Markov model? Nature Biotechnology. Methods: Altschul SF et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic acids research. Eddy (2011) Accelerated Profile HMM Searches. PLoS Computational Biology. Paul Gardner Homology Search
  • 21. The End Paul Gardner Homology Search