SlideShare une entreprise Scribd logo
1  sur  15
The Smith-Waterman algorithm

                    Dr Avril Coghlan
                   alc@sanger.ac.uk

Note: this talk contains animations which can only be seen by
downloading and using ‘View Slide show’ in Powerpoint
Global versus Local Alignment
• A global alignment covers the entire lengths of the
  sequences involved
  The Needleman-Wunsch algorithm finds the best global alignment
  between 2 sequences
• A local alignment only covers parts of the sequences
  The Smith-Waterman algorithm finds the best local   alignment
  between 2 sequences


  Global alignment       Q K E S G P S S S Y C
                         |   | | |           |
                       V Q Q E S G L V R T T C
  Local alignment              E S G
                               | | |
                               E S G
Local alignment
• The concept of ‘local alignment’ was introduced by
  Smith & Waterman in 1981
• A local alignment of 2 sequences is an alignment
  between parts of the 2 sequences
  Two proteins may one share one stretch of high sequence
  similarity,      but be very dissimilar outside that region
  A global (N-W) alignment of such sequences would have:
   (i) lots of matches in the region of high sequence similarity
  (ii) lots of mismatches & gaps (insertions/deletions) outside the region
          of similarity
  It makes sense to find the best local alignment instead
Real data: fruitfly & human Eyeless
                    • This is a global
                      alignment of human
                      & fruitfly Eyeless

                     Do you think it’s
                     sensible to make a
                     global alignment of
                     these two sequences?
Real data: fruitfly & human Eyeless
                     There are 2 short
                     regions of high
                     similarity

                     Outside those regions,
                     there are many
                     mismatches and gaps

                     It might be more
                     sensible to make local
                     alignments of one or
                     both of the regions of
                     high similarity
Real data: fruitfly & human Eyeless
                     • This is a local
                       alignment of human
                       & fruitfly Eyeless

                       What parts of the
                       sequences were
                       used in the local
                       alignment?
The Smith-Waterman algorithm
• S-W is mathematically proven to find the best
  (highest-scoring) local alignment of 2 sequences
  The best local alignment is the best alignment of all possible
  subsequences (parts) of sequences S1 and S2
  The 0th row and 0th column of T are first filled with zeroes
  The recurrence relation used to fill table T is:
                 T(i-1, j-1) + σ(S1(i), S2(j))
  T(i, j) = max  T(i-1, j) + gap penalty
                 T(i, j-1) + gap penalty                A 4th possibility (unlike
                 0                                      N-W)
  The traceback starts at the highest scoring cell in the matrix T, and travels
  up/left while the score is still positive
  (While in N-W, traceback starts at the bottom right, & ends at the top
        left, which ensures it’s a global alignment)
• eg., to find the best local alignment of sequences
  “ACCTAAGG” and “GGCTCAATCA”, using +2 for a
  match, -1 for a mismatch, and -2 for a gap:
  We first make matrix T (as in N-W):
  The 0th row and 0th column of T are filled with zeroes
  The recurrence relation is then used to fill the matrix T
                     G   G   C   T   C   A   A   T   C   A
                0    0   0   0   0   0   0   0   0   0   0
            A   0
            C   0
            C   0
            T   0
            A   0
            A   0
            G   0
            G   0
We first calculate T(1,1) using the recurrence relation:
           T(i-1, j-1) + σ(S1(i), S2(j)) = 0 – 1 = -1
    T(i, j) = max       T(i-1, j) + gap penalty = 0 -2 = -2
     T(i, j-1) + gap penalty = 0 -2 = -2
     0
    The maximum value is 0, so we set T(1,1) to 0
        G   G   C   T   C    A   A   T   C   A
    0   0   0   0   0   0    0   0   0   0   0
                                                 We next calculate T(2,1)…
A   0   0
        ?   ?
C   0
C   0
T   0
A   0
A   0
G   0
G   0
You fill in the whole of T, recording the previous cell (if any)   used
to calculate the value of each T(i, j):
                 G
                 G   G
                     G   C
                         C    T
                              T   C
                                  C   A
                                      A   A
                                          A   T
                                              T   C
                                                  C    A
                                                       A
             0   0   0   0    0   0   0   0   0   0    0
         A   0   0   0   0    0   0   2   2   0   0    2

         C   0   0   0   2    0   2   0   1   1   2    0
         C   0   0   0   2    1   2   1   0   0   3    1
         T   0   0   0   0    4   2   1   0   2   1    2
         A
         A   0   0   0   0    2   3   4   3   1   1    3
         A
         A   0   0   0   0    0   1   5   6   4   2    3
         G
         G   0   2   2   0    0   0   3   4   5   3    1
         G
         G   0   2   4   2    0   0   1   2   3   4    2
G   G   C   T   C   A   A   T   C   A
             0   0   0   0   0   0   0   0   0   0   0
         A   0   0   0   0   0   0   2   2   0   0   2
         C   0   0   0   2   0   2   0   1   1   2   0
         C   0   0   0   2   1   2   1   0   0   3   1
         T   0   0   0   0   4   2   1   0   2   1   2
         A   0   0   0   0   2   3   4   3   1   1   3
         A   0   0   0   0   0   1   5   6   4   2   3
         G   0   2   2   0   0   0   3   4   5   3   1
         G   0   2   4   2   0   0   1   2   3   4   2

You work out the best local alignment from the traceback (just like in N-
W):                          C T C A A
                             | |    | |
                             C T - A A
Software for making alignments
• For Smith-Waterman pairwise alignment
  pairwiseAlignment() in the “Biostrings” R library
  the EMBOSS (emboss.sourceforge.net/) water program
Problem
• Find the best local alignment between
  “TCAGTTGCC” & “AGGTTG”, with +1 for a match, -2
  for a mismatch, and -2 for a gap.
Answer
• Find the best local alignment between
  “TCAGTTGCC” & “AGGTTG”, with +1 for a match, -2
  for a mismatch, and -2 for a gap
  Matrix T looks like this, with the pink traceback:
           T   C   A   G   T   T   G   C   C
       0   0   0   0   0   0   0   0   0   0
   A   0   0   0   1   0   0   0   0   0   0
                                                       Alignment:

   G   0   0   0   0   2   0   0   1   0   0
                                                       G T T G
   G   0   0   0   0   1   0   0   1   0   0           | | | |
   T   0   1   0   0   0   2   1   0   0   0           G T T G

   T   0   1   0   0   0   1   3   1   0   0      (Pink traceback)

   G   0   0   0   0   1   0   1   4   2   0
Further Reading
•   Chapter 3 in Introduction to Computational Genomics Cristianini & Hahn
•   Chapter 6 in Deonier et al Computational Genome Analysis
•   Practical on pairwise alignment in R in the Little Book of R for
    Bioinformatics:
    https://a-little-book-of-r-for-
    bioinformatics.readthedocs.org/en/latest/src/chapter4.html

Contenu connexe

Tendances

sequence alignment
sequence alignmentsequence alignment
sequence alignmentammar kareem
 
MULTIPLE SEQUENCE ALIGNMENT
MULTIPLE  SEQUENCE  ALIGNMENTMULTIPLE  SEQUENCE  ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENTMariya Raju
 
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-naveed ul mushtaq
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijayVijay Hemmadi
 
shotgun sequncing
 shotgun sequncing shotgun sequncing
shotgun sequncingSAIFALI444
 
Dynamic programming and pairwise sequence alignment
Dynamic programming and pairwise sequence alignmentDynamic programming and pairwise sequence alignment
Dynamic programming and pairwise sequence alignmentGeethanjaliAnilkumar2
 
The Needleman Wunsch algorithm
The Needleman Wunsch algorithmThe Needleman Wunsch algorithm
The Needleman Wunsch algorithmavrilcoghlan
 
Introduction to sequence alignment partii
Introduction to sequence alignment partiiIntroduction to sequence alignment partii
Introduction to sequence alignment partiiSumatiHajela
 
Nucleic Acid Sequence databases
Nucleic Acid Sequence databasesNucleic Acid Sequence databases
Nucleic Acid Sequence databasesPranavathiyani G
 
Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Pritom Chaki
 
Protein protein interaction
Protein protein interactionProtein protein interaction
Protein protein interactionAashish Patel
 
Introduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASEIntroduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASEPrashantSharma807
 
Global and Local Sequence Alignment
Global and Local Sequence AlignmentGlobal and Local Sequence Alignment
Global and Local Sequence AlignmentAjayPatil210
 

Tendances (20)

Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
MULTIPLE SEQUENCE ALIGNMENT
MULTIPLE  SEQUENCE  ALIGNMENTMULTIPLE  SEQUENCE  ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENT
 
Fasta
FastaFasta
Fasta
 
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijay
 
Dot matrix
Dot matrixDot matrix
Dot matrix
 
smith - waterman algorithm.pptx
smith - waterman algorithm.pptxsmith - waterman algorithm.pptx
smith - waterman algorithm.pptx
 
shotgun sequncing
 shotgun sequncing shotgun sequncing
shotgun sequncing
 
Dynamic programming and pairwise sequence alignment
Dynamic programming and pairwise sequence alignmentDynamic programming and pairwise sequence alignment
Dynamic programming and pairwise sequence alignment
 
Dynamic programming
Dynamic programming Dynamic programming
Dynamic programming
 
The Needleman Wunsch algorithm
The Needleman Wunsch algorithmThe Needleman Wunsch algorithm
The Needleman Wunsch algorithm
 
Introduction to sequence alignment partii
Introduction to sequence alignment partiiIntroduction to sequence alignment partii
Introduction to sequence alignment partii
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Nucleic Acid Sequence databases
Nucleic Acid Sequence databasesNucleic Acid Sequence databases
Nucleic Acid Sequence databases
 
Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)
 
Protein protein interaction
Protein protein interactionProtein protein interaction
Protein protein interaction
 
Introduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASEIntroduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASE
 
Global and Local Sequence Alignment
Global and Local Sequence AlignmentGlobal and Local Sequence Alignment
Global and Local Sequence Alignment
 
dot plot analysis
dot plot analysisdot plot analysis
dot plot analysis
 

Similaire à The Smith Waterman algorithm

Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...AIST
 
A new six point finite difference scheme for nonlinear waves interaction model
A new six point finite difference scheme for nonlinear waves interaction modelA new six point finite difference scheme for nonlinear waves interaction model
A new six point finite difference scheme for nonlinear waves interaction modelAlexander Decker
 
Spatially resolved pair correlation functions for point cloud data
Spatially resolved pair correlation functions for point cloud dataSpatially resolved pair correlation functions for point cloud data
Spatially resolved pair correlation functions for point cloud dataTony Fast
 
Epidemic processes on switching networks
Epidemic processes on switching networksEpidemic processes on switching networks
Epidemic processes on switching networksNaoki Masuda
 
A common unique random fixed point theorem in hilbert space using integral ty...
A common unique random fixed point theorem in hilbert space using integral ty...A common unique random fixed point theorem in hilbert space using integral ty...
A common unique random fixed point theorem in hilbert space using integral ty...Alexander Decker
 
Estimating ecosystem functional features from intra-specific trait data
Estimating ecosystem functional features from intra-specific trait dataEstimating ecosystem functional features from intra-specific trait data
Estimating ecosystem functional features from intra-specific trait dataTano Gutiérrez Cánovas
 
20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07Computer Science Club
 
Robust fuzzy-observer-design-for-nonlinear-systems
Robust fuzzy-observer-design-for-nonlinear-systemsRobust fuzzy-observer-design-for-nonlinear-systems
Robust fuzzy-observer-design-for-nonlinear-systemsCemal Ardil
 
Controllability of Linear Dynamical System
Controllability of  Linear Dynamical SystemControllability of  Linear Dynamical System
Controllability of Linear Dynamical SystemPurnima Pandit
 
Geohydrology ii (3)
Geohydrology ii (3)Geohydrology ii (3)
Geohydrology ii (3)Amro Elfeki
 
Hierarchical matrix approximation of large covariance matrices
Hierarchical matrix approximation of large covariance matricesHierarchical matrix approximation of large covariance matrices
Hierarchical matrix approximation of large covariance matricesAlexander Litvinenko
 

Similaire à The Smith Waterman algorithm (20)

D028036046
D028036046D028036046
D028036046
 
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
 
Asymptotic Analysis.ppt
Asymptotic Analysis.pptAsymptotic Analysis.ppt
Asymptotic Analysis.ppt
 
A new six point finite difference scheme for nonlinear waves interaction model
A new six point finite difference scheme for nonlinear waves interaction modelA new six point finite difference scheme for nonlinear waves interaction model
A new six point finite difference scheme for nonlinear waves interaction model
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence Alignment
 
17330361.ppt
17330361.ppt17330361.ppt
17330361.ppt
 
Lecture 23 loop transfer function
Lecture 23 loop transfer functionLecture 23 loop transfer function
Lecture 23 loop transfer function
 
Spatially resolved pair correlation functions for point cloud data
Spatially resolved pair correlation functions for point cloud dataSpatially resolved pair correlation functions for point cloud data
Spatially resolved pair correlation functions for point cloud data
 
Epidemic processes on switching networks
Epidemic processes on switching networksEpidemic processes on switching networks
Epidemic processes on switching networks
 
A common unique random fixed point theorem in hilbert space using integral ty...
A common unique random fixed point theorem in hilbert space using integral ty...A common unique random fixed point theorem in hilbert space using integral ty...
A common unique random fixed point theorem in hilbert space using integral ty...
 
Estimating ecosystem functional features from intra-specific trait data
Estimating ecosystem functional features from intra-specific trait dataEstimating ecosystem functional features from intra-specific trait data
Estimating ecosystem functional features from intra-specific trait data
 
E023048063
E023048063E023048063
E023048063
 
E023048063
E023048063E023048063
E023048063
 
20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07
 
Bioinformatica t3-scoringmatrices v2014
Bioinformatica t3-scoringmatrices v2014Bioinformatica t3-scoringmatrices v2014
Bioinformatica t3-scoringmatrices v2014
 
Robust fuzzy-observer-design-for-nonlinear-systems
Robust fuzzy-observer-design-for-nonlinear-systemsRobust fuzzy-observer-design-for-nonlinear-systems
Robust fuzzy-observer-design-for-nonlinear-systems
 
Controllability of Linear Dynamical System
Controllability of  Linear Dynamical SystemControllability of  Linear Dynamical System
Controllability of Linear Dynamical System
 
Geohydrology ii (3)
Geohydrology ii (3)Geohydrology ii (3)
Geohydrology ii (3)
 
Hierarchical matrix approximation of large covariance matrices
Hierarchical matrix approximation of large covariance matricesHierarchical matrix approximation of large covariance matrices
Hierarchical matrix approximation of large covariance matrices
 
C023014030
C023014030C023014030
C023014030
 

Plus de avrilcoghlan

DESeq Paper Journal club
DESeq Paper Journal club DESeq Paper Journal club
DESeq Paper Journal club avrilcoghlan
 
Introduction to genomes
Introduction to genomesIntroduction to genomes
Introduction to genomesavrilcoghlan
 
Statistical significance of alignments
Statistical significance of alignmentsStatistical significance of alignments
Statistical significance of alignmentsavrilcoghlan
 
Multiple alignment
Multiple alignmentMultiple alignment
Multiple alignmentavrilcoghlan
 
Alignment scoring functions
Alignment scoring functionsAlignment scoring functions
Alignment scoring functionsavrilcoghlan
 
Pairwise sequence alignment
Pairwise sequence alignmentPairwise sequence alignment
Pairwise sequence alignmentavrilcoghlan
 
Dotplots for Bioinformatics
Dotplots for BioinformaticsDotplots for Bioinformatics
Dotplots for Bioinformaticsavrilcoghlan
 
Introduction to HMMs in Bioinformatics
Introduction to HMMs in BioinformaticsIntroduction to HMMs in Bioinformatics
Introduction to HMMs in Bioinformaticsavrilcoghlan
 

Plus de avrilcoghlan (10)

DESeq Paper Journal club
DESeq Paper Journal club DESeq Paper Journal club
DESeq Paper Journal club
 
Introduction to genomes
Introduction to genomesIntroduction to genomes
Introduction to genomes
 
Homology
HomologyHomology
Homology
 
Statistical significance of alignments
Statistical significance of alignmentsStatistical significance of alignments
Statistical significance of alignments
 
BLAST
BLASTBLAST
BLAST
 
Multiple alignment
Multiple alignmentMultiple alignment
Multiple alignment
 
Alignment scoring functions
Alignment scoring functionsAlignment scoring functions
Alignment scoring functions
 
Pairwise sequence alignment
Pairwise sequence alignmentPairwise sequence alignment
Pairwise sequence alignment
 
Dotplots for Bioinformatics
Dotplots for BioinformaticsDotplots for Bioinformatics
Dotplots for Bioinformatics
 
Introduction to HMMs in Bioinformatics
Introduction to HMMs in BioinformaticsIntroduction to HMMs in Bioinformatics
Introduction to HMMs in Bioinformatics
 

Dernier

Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseCeline George
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 

Dernier (20)

prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Development
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 Database
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 

The Smith Waterman algorithm

  • 1. The Smith-Waterman algorithm Dr Avril Coghlan alc@sanger.ac.uk Note: this talk contains animations which can only be seen by downloading and using ‘View Slide show’ in Powerpoint
  • 2. Global versus Local Alignment • A global alignment covers the entire lengths of the sequences involved The Needleman-Wunsch algorithm finds the best global alignment between 2 sequences • A local alignment only covers parts of the sequences The Smith-Waterman algorithm finds the best local alignment between 2 sequences Global alignment Q K E S G P S S S Y C | | | | | V Q Q E S G L V R T T C Local alignment E S G | | | E S G
  • 3. Local alignment • The concept of ‘local alignment’ was introduced by Smith & Waterman in 1981 • A local alignment of 2 sequences is an alignment between parts of the 2 sequences Two proteins may one share one stretch of high sequence similarity, but be very dissimilar outside that region A global (N-W) alignment of such sequences would have: (i) lots of matches in the region of high sequence similarity (ii) lots of mismatches & gaps (insertions/deletions) outside the region of similarity It makes sense to find the best local alignment instead
  • 4. Real data: fruitfly & human Eyeless • This is a global alignment of human & fruitfly Eyeless Do you think it’s sensible to make a global alignment of these two sequences?
  • 5. Real data: fruitfly & human Eyeless There are 2 short regions of high similarity Outside those regions, there are many mismatches and gaps It might be more sensible to make local alignments of one or both of the regions of high similarity
  • 6. Real data: fruitfly & human Eyeless • This is a local alignment of human & fruitfly Eyeless What parts of the sequences were used in the local alignment?
  • 7. The Smith-Waterman algorithm • S-W is mathematically proven to find the best (highest-scoring) local alignment of 2 sequences The best local alignment is the best alignment of all possible subsequences (parts) of sequences S1 and S2 The 0th row and 0th column of T are first filled with zeroes The recurrence relation used to fill table T is: T(i-1, j-1) + σ(S1(i), S2(j)) T(i, j) = max T(i-1, j) + gap penalty T(i, j-1) + gap penalty A 4th possibility (unlike 0 N-W) The traceback starts at the highest scoring cell in the matrix T, and travels up/left while the score is still positive (While in N-W, traceback starts at the bottom right, & ends at the top left, which ensures it’s a global alignment)
  • 8. • eg., to find the best local alignment of sequences “ACCTAAGG” and “GGCTCAATCA”, using +2 for a match, -1 for a mismatch, and -2 for a gap: We first make matrix T (as in N-W): The 0th row and 0th column of T are filled with zeroes The recurrence relation is then used to fill the matrix T G G C T C A A T C A 0 0 0 0 0 0 0 0 0 0 0 A 0 C 0 C 0 T 0 A 0 A 0 G 0 G 0
  • 9. We first calculate T(1,1) using the recurrence relation: T(i-1, j-1) + σ(S1(i), S2(j)) = 0 – 1 = -1 T(i, j) = max T(i-1, j) + gap penalty = 0 -2 = -2 T(i, j-1) + gap penalty = 0 -2 = -2 0 The maximum value is 0, so we set T(1,1) to 0 G G C T C A A T C A 0 0 0 0 0 0 0 0 0 0 0 We next calculate T(2,1)… A 0 0 ? ? C 0 C 0 T 0 A 0 A 0 G 0 G 0
  • 10. You fill in the whole of T, recording the previous cell (if any) used to calculate the value of each T(i, j): G G G G C C T T C C A A A A T T C C A A 0 0 0 0 0 0 0 0 0 0 0 A 0 0 0 0 0 0 2 2 0 0 2 C 0 0 0 2 0 2 0 1 1 2 0 C 0 0 0 2 1 2 1 0 0 3 1 T 0 0 0 0 4 2 1 0 2 1 2 A A 0 0 0 0 2 3 4 3 1 1 3 A A 0 0 0 0 0 1 5 6 4 2 3 G G 0 2 2 0 0 0 3 4 5 3 1 G G 0 2 4 2 0 0 1 2 3 4 2
  • 11. G G C T C A A T C A 0 0 0 0 0 0 0 0 0 0 0 A 0 0 0 0 0 0 2 2 0 0 2 C 0 0 0 2 0 2 0 1 1 2 0 C 0 0 0 2 1 2 1 0 0 3 1 T 0 0 0 0 4 2 1 0 2 1 2 A 0 0 0 0 2 3 4 3 1 1 3 A 0 0 0 0 0 1 5 6 4 2 3 G 0 2 2 0 0 0 3 4 5 3 1 G 0 2 4 2 0 0 1 2 3 4 2 You work out the best local alignment from the traceback (just like in N- W): C T C A A | | | | C T - A A
  • 12. Software for making alignments • For Smith-Waterman pairwise alignment pairwiseAlignment() in the “Biostrings” R library the EMBOSS (emboss.sourceforge.net/) water program
  • 13. Problem • Find the best local alignment between “TCAGTTGCC” & “AGGTTG”, with +1 for a match, -2 for a mismatch, and -2 for a gap.
  • 14. Answer • Find the best local alignment between “TCAGTTGCC” & “AGGTTG”, with +1 for a match, -2 for a mismatch, and -2 for a gap Matrix T looks like this, with the pink traceback: T C A G T T G C C 0 0 0 0 0 0 0 0 0 0 A 0 0 0 1 0 0 0 0 0 0 Alignment: G 0 0 0 0 2 0 0 1 0 0 G T T G G 0 0 0 0 1 0 0 1 0 0 | | | | T 0 1 0 0 0 2 1 0 0 0 G T T G T 0 1 0 0 0 1 3 1 0 0 (Pink traceback) G 0 0 0 0 1 0 1 4 2 0
  • 15. Further Reading • Chapter 3 in Introduction to Computational Genomics Cristianini & Hahn • Chapter 6 in Deonier et al Computational Genome Analysis • Practical on pairwise alignment in R in the Little Book of R for Bioinformatics: https://a-little-book-of-r-for- bioinformatics.readthedocs.org/en/latest/src/chapter4.html

Notes de l'éditeur

  1. Image credit (Temple Smith): http://www.modulargenetics.com/Temple%20Smith.jpg Image credit (Michael Waterman): http://www.iscb.org/cms_addon/conferences/ismb2003/images/watterman.jpg
  2. Made alignment of human.fa and fly.fa using Needleman-wunsch with default parameters at: http://emboss.bioinformatics.nl/cgi-bin/emboss/needle (EMBOSS needle) Human Eyeless (PAX6) from: http://www.treefam.org/cgi-bin/TFseq.pl?id=ENST00000379111.1 D. Melanogaster Eyeless from: http://www.treefam.org/cgi-bin/TFseq.pl?id=FBtr0100396.5 Viewed in jalview, and saved as humanfly_needlemanwunsch.png
  3. Made alignment of human.fa and fly.fa using Smith-Waterman with default parameters at: http://emboss.bioinformatics.nl/cgi-bin/emboss/water (EMBOSS) Human Eyeless (PAX6) from: http://www.treefam.org/cgi-bin/TFseq.pl?id=ENST00000379111.1 D. Melanogaster Eyeless from: http://www.treefam.org/cgi-bin/TFseq.pl?id=FBtr0100396.5 Viewed in jalview, and saved as humanfly_smithwaterman.png
  4. In R: >library("Biostrings") >seq1 <- "GGCTCAATCA" >seq2 <- "ACCTAAGG" >sigma <- nucleotideSubstitutionMatrix(match = 2, mismatch = -1, baseOnly = TRUE) >pairwiseAlignment(seq1, seq2, substitutionMatrix = sigma, gapOpening = 0, gapExtension = -2, scoreOnly = FALSE,type="local") dFixedSubject (1 of 1) pattern: [3] CTCAA subject: [3] CT-AA score: 6 Also: >source("C:/Documents and Settings/Avril Coughlan/My Documents/Rfunctions.R") >dnasmithwaterman(seq1,seq2,gapopen=0,gapextend=-2,mymatch=2,mymismatch=-1) [1] "maxT= 6" NA G G C T C A A T C A NA NA NA NA NA NA NA NA NA NA NA NA A NA "0 +" "0 +" "0 +" "0 +" "0 +" "2 >" "2 >" "0 -" "0 +" "2 >" C NA "0 +" "0 +" "2 >" "0 -" "2 >" "0 L" "1 >" "1 >" "2 >" "0 L" C NA "0 +" "0 +" "2 >" "1 >" "2 >" "1 >" "0 +" "0 >" "3 >" "1 Z" T NA "0 +" "0 +" "0 |" "4 >" "2 -" "1 >" "0 >" "2 >" "1 |" "2 >" A NA "0 +" "0 +" "0 +" "2 |" "3 >" "4 >" "3 >" "1 -" "1 >" "3 >" A NA "0 +" "0 +" "0 +" "0 |" "1 V" "5 >" "6 >" "4 -" "2 -" "3 >" G NA "2 >" "2 >" "0 -" "0 +" "0 +" "3 |" "4 V" "5 >" "3 Z" "1 *" G NA "2 >" "4 >" "2 -" "0 -" "0 +" "1 |" "2 V" "3 V" "4 >" "2 Z“ NOTE: there seems to be a mistake in the Deonier book for this example on page 157 of Deonier – it has “... 2 3 4 3 2 1 3” on one row, but should have “ ... 2 3 4 3 1 1 3” on that row (row i =5).
  5. In R: >library("Biostrings") >seq1 <- " TCAGTTGCC " >seq2 <- " AGGTTG " >sigma <- nucleotideSubstitutionMatrix(match = 1, mismatch = -2, baseOnly = TRUE) >pairwiseAlignment(seq1, seq2, substitutionMatrix = sigma, gapOpening = 0, gapExtension = -2, scoreOnly = FALSE,type="local") Local PairwiseAlignedFixedSubject (1 of 1) pattern: [4] GTTG subject: [3] GTTG score: 4 Also: >source("C:/Documents and Settings/Avril Coughlan/My Documents/Rfunctions.R") >dnasmithwaterman(seq1,seq2,gapopen=0,gapextend=-2,mymatch=1,mymismatch=-2) [1] "maxT= 4" NA T C A G T T G C C NA NA NA NA NA NA NA NA NA NA NA A NA "0 +" "0 +" "1 >" "0 +" "0 +" "0 +" "0 +" "0 +" "0 +" G NA "0 +" "0 +" "0 +" "2 >" "0 -" "0 +" "1 >" "0 +" "0 +" G NA "0 +" "0 +" "0 +" "1 >" "0 >" "0 +" "1 >" "0 +" "0 +" T NA "1 >" "0 +" "0 +" "0 +" "2 >" "1 >" "0 +" "0 +" "0 +" T NA "1 >" "0 +" "0 +" "0 +" "1 >" "3 >" "1 -" "0 +" "0 +" G NA "0 +" "0 +" "0 +" "1 >" "0 +" "1 |" "4 >" "2 -" "0 -"