SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
Homology Search
Paul Gardner
March 24, 2015
Paul Gardner Homology Search
News & Views reminder (20% of your course grade, due
March 26, Reviewed April 2 (5/20), Revisions April 28
(15/20))
Meredith et al. (2014) Evidence for a single loss of
mineralized teeth in the common avian ancestor. Science
Nunez et al. (2015) Integrase-mediated spacer acquisition
during CRISPR-Cas adaptive immunity. Nature
Paul Gardner Homology Search
Homology search
In a huge collection of biological
sequences how can you locate
similar sequences?
by using heuristic, super fast,
sequence alignment methods
Paul Gardner Homology Search
BLAST
Paul Gardner Homology Search
BLAST
Identify all ’hits’ of at least W long
Find any hits on the same diagonal of an alignment matrix
Trigger a full alignment in that region
Basic idea: identify near-identical sub-sequences first → align any
hits in full
Paul Gardner Homology Search
What does that E-value (Expect) mean?
>gb|CP001191.1| Rhizobium leguminosarum bv. trifolii WSM2304, complete genome
Length=4537948
Features in this part of subject sequence:
cold-shock DNA-binding domain protein
Score = 57.2 bits (62), Expect = 2e-05
Identities = 78/106 (74%), Gaps = 6/106 (6%)
Strand=Plus/Plus
Query 1 CTTCGTCAGATTTCCTCTCAATATCGATCATACCGGACTGATATTCGTCCGG----GAAC
|| |||||||| ||||||||| |||||| | | | || |||| |||| ||||
Sbjct 828507 CTCCGTCAGATATCCTCTCAACATCGATACGGCTTGTCGGACATTCTTCCGCAGGCGAAC
Query 57 TCTAGCGATTGAAA-GGAAATCGTTATGAACTCAGGCACCGTAAAG
| | || |||||| ||| ||||||||||| |||||| ||| |||
Sbjct 828567 ACAA-CGGTTGAAAAGGAGATCGTTATGAATTCAGGCGTCGTCAAG
Paul Gardner Homology Search
How can we evaluate the significance of a score?
Note that a bit-score of 57.2 by itself is not that useful.
It depends on the sequence & database size & composition.
To counter this we can compute an Expect-value (E-value).
This is the expected number of hits with the observed score for
the given query and database sizes.
P-values can also be used
0 100 200 300 400 500 600 700
0
2000
4000
6000
8000
10000
Separating true from false hits
score (bits)
Num.matches
Random sequences/Negative controls
True homologs/Positive controls
Threshold
False negatives
True positives
False positives
True negatives
Paul Gardner Homology Search
How can we evaluate the significance of a score?
0 100 200 300 400 500 600 700
0
2000
4000
6000
8000
10000
Separating true from false hits
score (bits)
Num.matches
Random sequences/Negative controls
True homologs/Positive controls
Threshold
False negatives
True positives
False positives
True negatives
E = κMN2−λx
E: E-value
M&N: query &
database size
κ&λ: fitting
parameters
Paul Gardner Homology Search
BLAST is not the only, or best tool for the job!
Paul Gardner Homology Search
Profile-based homology search
Krogh, A. et al. (1994) Hidden Markov models in computational biology. Applications to protein modeling. J Mol
Biol.
Image provided by Eric Nawrocki.
Paul Gardner Homology Search
Profile-based homology search – scoring sequences
Image provided by Eric Nawrocki.
Paul Gardner Homology Search
Profile HMM are slightly more complicated
A tree-weighting scheme takes care of unbalanced
alignments
Dirichlet-mixture priors are used to incorporate information
about amino-acid biochemistry
Effective sequence number is used to down-weight priors
when many sequences are available
Transition probabilities to Insert & Delete states are estimated
from the alignment
Paul Gardner Homology Search
Why not just use BLAST?
ACCURACY!
Every benchmark of homology search tools has shown that
profile methods are more accurate than single-sequence
methods.
Eddy (2011) Accelerated Profile HMM Searches. PLoS
Computational Biology.
Paul Gardner Homology Search
Why not just use BLAST?
SPEED! To search a single query vs a database of all proteins:
BLAST: searches 42 million UniProt sequences
HMMER: searches 15,000 Pfam profiles
The search space is ∼ 3, 000x smaller for profiles
Save Planet Earth, use HMMER3
Eddy (2011) Accelerated Profile HMM Searches. PLoS
Computational Biology.
Paul Gardner Homology Search
Pfam
What is a Pfam-A Entry?
hmmsearch
hmmbuild
hmmalign
SEED
HMM
OUTOUT
ALIGNDESC
Slide borrowed from Rob Finn.
Paul Gardner Homology Search
But, what about RNA?
5’
3’
0
Sequence conservation
1
A
G
U
K G
C
U
C
A
U
U
CA
C
C
K
W
Y U
U
A
U
G
W
YR
G
YCC
C
g
C
Y
V
U
U
H R G C G
G
A
A
K
A
Y
G
YG
C
U
W
C
A
U
A
A R
M
Y
A
Y
C
G
A
A
U
G
AY
G
C M
H
A
A
G
M
M
WG
G
U
G
C
C
U R
Y
C
G
U
C
C A M
C
U
W
A
a
C
Y
G
A
U
A
W Y
R
K
G
U
G
MRU
R
C
R
C
W
U
U
A
U
C
AA
V
C
A
Y
C
G
G
R
C
GA
M
A
C
G
UY
G
A G
U
K
A
G
G
C
A
C
CGC
C
U
W
5’
3’
0
Sequence conservation
1
A
A
Y
A
A
A
A
U
A
A
U
U
U
A
C
AUUCCA AG
G
A
C
C
G
G
UA
U
U
A
U
U
GU A
G
G
G
G
A
U
U
U
GU
G
AC
U
U
Y C
A
A
G
G
C
A
A
Y
G
U
C
C
U
C
U
C
U
A
C
AA
C
C
G
A
G
U
U
C R
A
G
A
A
U
A
A
R
Y
A
C
M
A
A
YG
G
C
U
C U
U
U
U
U
G
UU
A
U
U
C
G
A
A
A
G C
U
U
A
C
A
A
G
DU
V
Y
R
G
Y
R
U
M
U
U
C
U
R
U
A
U
R
C
U
C
W
C
Y
Uc
a
M
U
Y
A C
U
U
U
C
M
A
G
U
AC
U
U
C
A
C
A
C G
G
G
C
CWRACAK
M
U
5’ 3’
0
Sequence conservation
1
U
V
D
WHAUGA
U
G
A
G
Y
U
C
M
A
C
U
U
C
W
U
u
G
G
U
C
C
G
U
G U U U C U G A g a R
M
C
Y
M
R
U
G
A
U
M
U
B
W
R
U
G
a
S
A
A
a
G
U
UCUGAY
U
H
M
Paul Gardner Homology Search
Covariance models
Nawrocki & Eddy (2007) Query-Dependent Banding (QDB) for Faster RNA Similarity Searches. PLOS
computational biology.
Paul Gardner Homology Search
Benchmark
Freyhult, Bollback & Gardner (2007) Exploring genomic dark matter: A critical assessment of the performance of
homology search methods on noncoding RNA. Genome Research.
Paul Gardner Homology Search
Rfam
Paul Gardner Homology Search
Relevant reading
Reviews:
Eddy SR (2004) What is a hidden Markov model? Nature
Biotechnology.
Methods:
Altschul SF et al. (1997) Gapped BLAST and PSI-BLAST: a
new generation of protein database search programs. Nucleic
acids research.
Eddy (2011) Accelerated Profile HMM Searches. PLoS
Computational Biology.
Paul Gardner Homology Search
The End
Paul Gardner Homology Search

Contenu connexe

Tendances

Tendances (20)

Blast
BlastBlast
Blast
 
DNA Microarray notes.pdf
DNA Microarray notes.pdfDNA Microarray notes.pdf
DNA Microarray notes.pdf
 
ZINC FINGER NUCLEASE TECHNOLOGY
ZINC FINGER NUCLEASE TECHNOLOGYZINC FINGER NUCLEASE TECHNOLOGY
ZINC FINGER NUCLEASE TECHNOLOGY
 
Gene prediction and expression
Gene prediction and expressionGene prediction and expression
Gene prediction and expression
 
Transcriptomics and metabolomics
Transcriptomics and metabolomicsTranscriptomics and metabolomics
Transcriptomics and metabolomics
 
SAGE (Serial analysis of Gene Expression)
SAGE (Serial analysis of Gene Expression)SAGE (Serial analysis of Gene Expression)
SAGE (Serial analysis of Gene Expression)
 
Transgenic and knockout mice
Transgenic and knockout miceTransgenic and knockout mice
Transgenic and knockout mice
 
Genomic Data Analysis
Genomic Data AnalysisGenomic Data Analysis
Genomic Data Analysis
 
DNA microarray
DNA microarrayDNA microarray
DNA microarray
 
Transgenic mice
Transgenic miceTransgenic mice
Transgenic mice
 
Gene prediction method
Gene prediction method Gene prediction method
Gene prediction method
 
Clustal X
Clustal XClustal X
Clustal X
 
Transcriptomics
TranscriptomicsTranscriptomics
Transcriptomics
 
An Introduction to Genomics
An Introduction to GenomicsAn Introduction to Genomics
An Introduction to Genomics
 
Genome sequencing
Genome sequencingGenome sequencing
Genome sequencing
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSTRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
Whole genome sequence.
Whole genome sequence.Whole genome sequence.
Whole genome sequence.
 
Sanger sequencing
Sanger sequencing Sanger sequencing
Sanger sequencing
 

Similaire à BIOL335: Homology search

Kyle Jensen MIT Ph.D. Thesis Defense
Kyle Jensen MIT Ph.D. Thesis DefenseKyle Jensen MIT Ph.D. Thesis Defense
Kyle Jensen MIT Ph.D. Thesis Defense
Kyle Jensen
 
Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Who...
Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Who...Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Who...
Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Who...
Human Variome Project
 
Day2 145pm Crawford
Day2 145pm CrawfordDay2 145pm Crawford
Day2 145pm Crawford
Sean Paul
 

Similaire à BIOL335: Homology search (20)

ppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdfppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdf
 
BIOL335: Functional genomics
BIOL335: Functional genomicsBIOL335: Functional genomics
BIOL335: Functional genomics
 
Kyle Jensen MIT Ph.D. Thesis Defense
Kyle Jensen MIT Ph.D. Thesis DefenseKyle Jensen MIT Ph.D. Thesis Defense
Kyle Jensen MIT Ph.D. Thesis Defense
 
Tair workshop stanford2017
Tair workshop stanford2017Tair workshop stanford2017
Tair workshop stanford2017
 
Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Who...
Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Who...Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Who...
Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Who...
 
2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...
2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...
2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...
 
Mikel egana itbam_2010_ogo_system
Mikel egana itbam_2010_ogo_systemMikel egana itbam_2010_ogo_system
Mikel egana itbam_2010_ogo_system
 
SMBE 2015: Expression STRs
SMBE 2015: Expression STRsSMBE 2015: Expression STRs
SMBE 2015: Expression STRs
 
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationVisual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
 
Prediction of protein function
Prediction of protein functionPrediction of protein function
Prediction of protein function
 
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
 
2015 osu-metagenome
2015 osu-metagenome2015 osu-metagenome
2015 osu-metagenome
 
http://lectures.gersteinlab.org/ppt/Gencode-winter08-20090121-pseudogenes/Gen...
http://lectures.gersteinlab.org/ppt/Gencode-winter08-20090121-pseudogenes/Gen...http://lectures.gersteinlab.org/ppt/Gencode-winter08-20090121-pseudogenes/Gen...
http://lectures.gersteinlab.org/ppt/Gencode-winter08-20090121-pseudogenes/Gen...
 
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...
 
BIOL335: Genetic selection
BIOL335: Genetic selectionBIOL335: Genetic selection
BIOL335: Genetic selection
 
Scalable Genome Analysis with ADAM
Scalable Genome Analysis with ADAMScalable Genome Analysis with ADAM
Scalable Genome Analysis with ADAM
 
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation OverviewPathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
 
Blast 2013 1
Blast 2013 1Blast 2013 1
Blast 2013 1
 
Vanderwall cheminformatics Drexel Part 1
Vanderwall cheminformatics Drexel Part 1Vanderwall cheminformatics Drexel Part 1
Vanderwall cheminformatics Drexel Part 1
 
Day2 145pm Crawford
Day2 145pm CrawfordDay2 145pm Crawford
Day2 145pm Crawford
 

Plus de Paul Gardner

Plus de Paul Gardner (20)

ppgardner-lecture07-genome-function.pdf
ppgardner-lecture07-genome-function.pdfppgardner-lecture07-genome-function.pdf
ppgardner-lecture07-genome-function.pdf
 
ppgardner-lecture05-alignment-comparativegenomics.pdf
ppgardner-lecture05-alignment-comparativegenomics.pdfppgardner-lecture05-alignment-comparativegenomics.pdf
ppgardner-lecture05-alignment-comparativegenomics.pdf
 
ppgardner-lecture04-annotation-comparativegenomics.pdf
ppgardner-lecture04-annotation-comparativegenomics.pdfppgardner-lecture04-annotation-comparativegenomics.pdf
ppgardner-lecture04-annotation-comparativegenomics.pdf
 
ppgardner-lecture03-genomesize-complexity.pdf
ppgardner-lecture03-genomesize-complexity.pdfppgardner-lecture03-genomesize-complexity.pdf
ppgardner-lecture03-genomesize-complexity.pdf
 
Does RNA avoidance dictate protein expression level?
Does RNA avoidance dictate protein expression level?Does RNA avoidance dictate protein expression level?
Does RNA avoidance dictate protein expression level?
 
Machine learning methods
Machine learning methodsMachine learning methods
Machine learning methods
 
Clustering
ClusteringClustering
Clustering
 
Monte Carlo methods
Monte Carlo methodsMonte Carlo methods
Monte Carlo methods
 
The jackknife and bootstrap
The jackknife and bootstrapThe jackknife and bootstrap
The jackknife and bootstrap
 
Contingency tables
Contingency tablesContingency tables
Contingency tables
 
Regression (II)
Regression (II)Regression (II)
Regression (II)
 
Regression (I)
Regression (I)Regression (I)
Regression (I)
 
Analysis of covariation and correlation
Analysis of covariation and correlationAnalysis of covariation and correlation
Analysis of covariation and correlation
 
Analysis of two samples
Analysis of two samplesAnalysis of two samples
Analysis of two samples
 
Analysis of single samples
Analysis of single samplesAnalysis of single samples
Analysis of single samples
 
Centrality and spread
Centrality and spreadCentrality and spread
Centrality and spread
 
Fundamentals of statistical analysis
Fundamentals of statistical analysisFundamentals of statistical analysis
Fundamentals of statistical analysis
 
Random RNA interactions control protein expression in prokaryotes
Random RNA interactions control protein expression in prokaryotesRandom RNA interactions control protein expression in prokaryotes
Random RNA interactions control protein expression in prokaryotes
 
Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Avoidance of stochastic RNA interactions can be harnessed to control protein ...Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Avoidance of stochastic RNA interactions can be harnessed to control protein ...
 
A meta-analysis of computational biology benchmarks reveals predictors of pro...
A meta-analysis of computational biology benchmarks reveals predictors of pro...A meta-analysis of computational biology benchmarks reveals predictors of pro...
A meta-analysis of computational biology benchmarks reveals predictors of pro...
 

Dernier

POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
Silpa
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
Silpa
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

BIOL335: Homology search

  • 1. Homology Search Paul Gardner March 24, 2015 Paul Gardner Homology Search
  • 2. News & Views reminder (20% of your course grade, due March 26, Reviewed April 2 (5/20), Revisions April 28 (15/20)) Meredith et al. (2014) Evidence for a single loss of mineralized teeth in the common avian ancestor. Science Nunez et al. (2015) Integrase-mediated spacer acquisition during CRISPR-Cas adaptive immunity. Nature Paul Gardner Homology Search
  • 3. Homology search In a huge collection of biological sequences how can you locate similar sequences? by using heuristic, super fast, sequence alignment methods Paul Gardner Homology Search
  • 5. BLAST Identify all ’hits’ of at least W long Find any hits on the same diagonal of an alignment matrix Trigger a full alignment in that region Basic idea: identify near-identical sub-sequences first → align any hits in full Paul Gardner Homology Search
  • 6. What does that E-value (Expect) mean? >gb|CP001191.1| Rhizobium leguminosarum bv. trifolii WSM2304, complete genome Length=4537948 Features in this part of subject sequence: cold-shock DNA-binding domain protein Score = 57.2 bits (62), Expect = 2e-05 Identities = 78/106 (74%), Gaps = 6/106 (6%) Strand=Plus/Plus Query 1 CTTCGTCAGATTTCCTCTCAATATCGATCATACCGGACTGATATTCGTCCGG----GAAC || |||||||| ||||||||| |||||| | | | || |||| |||| |||| Sbjct 828507 CTCCGTCAGATATCCTCTCAACATCGATACGGCTTGTCGGACATTCTTCCGCAGGCGAAC Query 57 TCTAGCGATTGAAA-GGAAATCGTTATGAACTCAGGCACCGTAAAG | | || |||||| ||| ||||||||||| |||||| ||| ||| Sbjct 828567 ACAA-CGGTTGAAAAGGAGATCGTTATGAATTCAGGCGTCGTCAAG Paul Gardner Homology Search
  • 7. How can we evaluate the significance of a score? Note that a bit-score of 57.2 by itself is not that useful. It depends on the sequence & database size & composition. To counter this we can compute an Expect-value (E-value). This is the expected number of hits with the observed score for the given query and database sizes. P-values can also be used 0 100 200 300 400 500 600 700 0 2000 4000 6000 8000 10000 Separating true from false hits score (bits) Num.matches Random sequences/Negative controls True homologs/Positive controls Threshold False negatives True positives False positives True negatives Paul Gardner Homology Search
  • 8. How can we evaluate the significance of a score? 0 100 200 300 400 500 600 700 0 2000 4000 6000 8000 10000 Separating true from false hits score (bits) Num.matches Random sequences/Negative controls True homologs/Positive controls Threshold False negatives True positives False positives True negatives E = κMN2−λx E: E-value M&N: query & database size κ&λ: fitting parameters Paul Gardner Homology Search
  • 9. BLAST is not the only, or best tool for the job! Paul Gardner Homology Search
  • 10. Profile-based homology search Krogh, A. et al. (1994) Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol. Image provided by Eric Nawrocki. Paul Gardner Homology Search
  • 11. Profile-based homology search – scoring sequences Image provided by Eric Nawrocki. Paul Gardner Homology Search
  • 12. Profile HMM are slightly more complicated A tree-weighting scheme takes care of unbalanced alignments Dirichlet-mixture priors are used to incorporate information about amino-acid biochemistry Effective sequence number is used to down-weight priors when many sequences are available Transition probabilities to Insert & Delete states are estimated from the alignment Paul Gardner Homology Search
  • 13. Why not just use BLAST? ACCURACY! Every benchmark of homology search tools has shown that profile methods are more accurate than single-sequence methods. Eddy (2011) Accelerated Profile HMM Searches. PLoS Computational Biology. Paul Gardner Homology Search
  • 14. Why not just use BLAST? SPEED! To search a single query vs a database of all proteins: BLAST: searches 42 million UniProt sequences HMMER: searches 15,000 Pfam profiles The search space is ∼ 3, 000x smaller for profiles Save Planet Earth, use HMMER3 Eddy (2011) Accelerated Profile HMM Searches. PLoS Computational Biology. Paul Gardner Homology Search
  • 15. Pfam What is a Pfam-A Entry? hmmsearch hmmbuild hmmalign SEED HMM OUTOUT ALIGNDESC Slide borrowed from Rob Finn. Paul Gardner Homology Search
  • 16. But, what about RNA? 5’ 3’ 0 Sequence conservation 1 A G U K G C U C A U U CA C C K W Y U U A U G W YR G YCC C g C Y V U U H R G C G G A A K A Y G YG C U W C A U A A R M Y A Y C G A A U G AY G C M H A A G M M WG G U G C C U R Y C G U C C A M C U W A a C Y G A U A W Y R K G U G MRU R C R C W U U A U C AA V C A Y C G G R C GA M A C G UY G A G U K A G G C A C CGC C U W 5’ 3’ 0 Sequence conservation 1 A A Y A A A A U A A U U U A C AUUCCA AG G A C C G G UA U U A U U GU A G G G G A U U U GU G AC U U Y C A A G G C A A Y G U C C U C U C U A C AA C C G A G U U C R A G A A U A A R Y A C M A A YG G C U C U U U U U G UU A U U C G A A A G C U U A C A A G DU V Y R G Y R U M U U C U R U A U R C U C W C Y Uc a M U Y A C U U U C M A G U AC U U C A C A C G G G C CWRACAK M U 5’ 3’ 0 Sequence conservation 1 U V D WHAUGA U G A G Y U C M A C U U C W U u G G U C C G U G U U U C U G A g a R M C Y M R U G A U M U B W R U G a S A A a G U UCUGAY U H M Paul Gardner Homology Search
  • 17. Covariance models Nawrocki & Eddy (2007) Query-Dependent Banding (QDB) for Faster RNA Similarity Searches. PLOS computational biology. Paul Gardner Homology Search
  • 18. Benchmark Freyhult, Bollback & Gardner (2007) Exploring genomic dark matter: A critical assessment of the performance of homology search methods on noncoding RNA. Genome Research. Paul Gardner Homology Search
  • 20. Relevant reading Reviews: Eddy SR (2004) What is a hidden Markov model? Nature Biotechnology. Methods: Altschul SF et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic acids research. Eddy (2011) Accelerated Profile HMM Searches. PLoS Computational Biology. Paul Gardner Homology Search
  • 21. The End Paul Gardner Homology Search