SlideShare une entreprise Scribd logo
1  sur  12
Multiple Alignment

                    Dr Avril Coghlan
                   alc@sanger.ac.uk

Note: this talk contains animations which can only be seen by
downloading and using ‘View Slide show’ in Powerpoint
Pairwise versus Multiple Alignment
• So far we have considered the alignment of two
  sequences (‘pairwise alignment’)
           Q K E S G P S S S Y C
           |   | | |           |
         V Q Q E S G L V R T T C
• Alignment can be performed between three or more
  sequences (‘multiple alignment’)
           Q K E S G   P S S S Y C
           |   | | |             |
         V Q Q E S G   L V R T T C
           |   | |     | | |   | |
         V Q K E S L   L V R S T C
Multiple alignment
• Multiple alignments are useful for comparing many
  homologous sequences at once



 Multiple alignment of part of Eyeless from different animals
• Multiple alignments can be global or local
  The majority of widely used programs for making multiple alignments
        (eg. CLUSTAL, T-COFFEE) create global multiple alignments (not
        local multiple alignments)
  If the sequences share one stretch of high sequence similarity, it might
        make sense to make a multiple alignment of just that region of
        similarity eg. for Eyeless
  You can “cut out” the region of similarity from each sequence, & make a
        multiple alignment of that region eg. using CLUSTAL
Real data: Eyeless proteins




              Do you think it’s sensible to
              make a global multiple
              alignment of these
              sequences?
The alignment is not very
reliable in regions of low
similarity
for example look at the
alignment of fly Eyeless to
the other proteins here
•   Algorithms for aligning 2 sequences (eg. N-W, S-W) can be
    extended to multiple sequences
    For aligning 3 sequences using N-W, we fill in a table T that is a 3D cube,
    using the recurrence relation:
                       T(i-1,j-1,k-1) + σ(S1(i),S2(j)) + σ(S1(i),S3(k)) + σ(S2(j),S3(k))
    T(i, j, k) = max   T(i-1, j, k) + gap penalty + gap penalty
                       T(i, j-1, k) + gap penalty + gap penalty
                       T(i, j, k-1) + gap penalty + gap penalty
                       T(i-1, j, k-1) + σ(S1(i),S3(k)) + gap penalty + gap penalty
                       T(i, j-1, k-1) + σ(S2(j),S3(k)) + gap penalty + gap penalty
                       T(i-1, j-1, k) + σ(S1(i),S2(j)) + gap penalty + gap penalty
• The run-time increases exponentially with the
  number of sequences you want to align
  Aligning 4 sequences of 100 amino acids takes ~3 days!
• Heuristic algorithms for multiple alignment are
  generally used, as they are fast
  eg. CLUSTAL, T-COFFEE
  ‘Heuristic’ means they’re not guaranteed to find the best solution (best
  alignment here)
  (While N-W & S-W are proven to find the best alignment)
• A popular heuristic algorithm is CLUSTAL, by Des
  Higgins and Paul Sharp at Trinity College Dublin
  (1988)
  Uses a ‘progressive alignment’ approach ie. aligns the most similar 2
       sequences first; adds the next most similar sequence to that
  alignment; adds the next most similar sequence … etc.
CLUSTAL
• A popular heuristic algorithm is CLUSTAL, by Des
  Higgins and Paul Sharp at TCD (1988)
  Cited >37,000 times; D. Higgins is Ireland’s most cited scientist
• CLUSTAL makes a global multiple alignment using a
  ‘progressive alignment’ approach
• First computes all pairwise alignments and calculates
  sequence similarity between pairs
• These similarities are used to build a rough ‘guide
  tree’                           S1
                                              S2
                                              S3
                                              S4
•
1 Then aligns the most similar pair of sequences
  This gives us an alignment of 2 sequences (called a ‘profile’)
  eg. alignment of sequences S1 and S2

•
2 Aligns the next closest pair of sequences (or pair of
  profiles, or sequence and profile)
  eg. alignment of sequences S1 and S2

•
3 Aligns the next closest pair of seqs/profiles
  eg. alignment of profiles S1-S2 and S3-S4

                                                  MQTIF            S1
                               MQTIF
                               LH-IW          1
           MQTIF                                  LHIW        S2
           LH-IW
           LQS-W        3
                                                  LQSW
           L-S-F              LQSW                            S3
                                              2
                              L-SF
                                                  LSF    S4
• A property of this method is that gap creation is
  irreversible: ‘once a gap, always a gap’

                                              MQTIF            S1
                            MQTIF
                            LH-IW      1
          MQTIF                               LHIW        S2
          LH-IW
          LQS-W       3
                                              LQSW
          L-S-F             LQSW                          S3
                                       2
                            L-SF
                                              LSF    S4

• This is a ‘heuristic algorithm’, ie. is not guaranteed to
  give the best alignment
  However, is very fast & works well in most cases
Software for making alignments
• For multiple alignment (heuristic programs)
  CLUSTAL http://www.ebi.ac.uk/Tools/msa/clustalw2/
  T-COFFEE http://tcoffee.vital-it.ch/cgi-bin/Tcoffee/tcoffee_cgi/index.cgi
  MUSCLE http://www.ebi.ac.uk/Tools/msa/muscle/
  MAFFT http://mafft.cbrc.jp/alignment/software/
Further Reading
•   Chapter 3 in Introduction to Computational Genomics Cristianini & Hahn
•   Chapter 6 in Deonier et al book Computational Genome Analysis
•   Practical on multiple alignment in R in the Little Book of R for
    Bioinformatics:
    https://a-little-book-of-r-for-
    bioinformatics.readthedocs.org/en/latest/src/chapter5.html

Contenu connexe

Tendances

Phylogenetic tree construction
Phylogenetic tree constructionPhylogenetic tree construction
Phylogenetic tree constructionUddalok Jana
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignmentAfra Fathima
 
Survey of softwares for phylogenetic analysis
Survey of softwares for phylogenetic analysisSurvey of softwares for phylogenetic analysis
Survey of softwares for phylogenetic analysisArindam Ghosh
 
RNA secondary structure prediction
RNA secondary structure predictionRNA secondary structure prediction
RNA secondary structure predictionMuhammed sadiq
 
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-naveed ul mushtaq
 
BLAST AND FASTA.pptx
BLAST AND FASTA.pptxBLAST AND FASTA.pptx
BLAST AND FASTA.pptxPiyushBehgal1
 
multiple sequence alignment
multiple sequence alignmentmultiple sequence alignment
multiple sequence alignmentharshita agarwal
 
Multiple Sequence Alignment
Multiple Sequence AlignmentMultiple Sequence Alignment
Multiple Sequence AlignmentMeghaj Mallick
 
Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Pritom Chaki
 
Sequence alignment global vs. local
Sequence alignment  global vs. localSequence alignment  global vs. local
Sequence alignment global vs. localbenazeer fathima
 
Scoring schemes in bioinformatics (blosum)
Scoring schemes in bioinformatics (blosum)Scoring schemes in bioinformatics (blosum)
Scoring schemes in bioinformatics (blosum)SumatiHajela
 

Tendances (20)

Phylogenetic tree construction
Phylogenetic tree constructionPhylogenetic tree construction
Phylogenetic tree construction
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Genome annotation
Genome annotationGenome annotation
Genome annotation
 
Survey of softwares for phylogenetic analysis
Survey of softwares for phylogenetic analysisSurvey of softwares for phylogenetic analysis
Survey of softwares for phylogenetic analysis
 
RNA secondary structure prediction
RNA secondary structure predictionRNA secondary structure prediction
RNA secondary structure prediction
 
Structural databases
Structural databases Structural databases
Structural databases
 
Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment   Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment
 
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-
 
BLAST AND FASTA.pptx
BLAST AND FASTA.pptxBLAST AND FASTA.pptx
BLAST AND FASTA.pptx
 
multiple sequence alignment
multiple sequence alignmentmultiple sequence alignment
multiple sequence alignment
 
Structure alignment methods
Structure alignment methodsStructure alignment methods
Structure alignment methods
 
Msa
MsaMsa
Msa
 
Multiple Sequence Alignment
Multiple Sequence AlignmentMultiple Sequence Alignment
Multiple Sequence Alignment
 
Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)
 
Sequence alignment global vs. local
Sequence alignment  global vs. localSequence alignment  global vs. local
Sequence alignment global vs. local
 
Fasta
FastaFasta
Fasta
 
Scoring schemes in bioinformatics (blosum)
Scoring schemes in bioinformatics (blosum)Scoring schemes in bioinformatics (blosum)
Scoring schemes in bioinformatics (blosum)
 
Composite and Specialized databases
Composite and Specialized databasesComposite and Specialized databases
Composite and Specialized databases
 
CATH
CATHCATH
CATH
 
European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)
 

Similaire à Multiple alignment

Paper Study - Incremental Data-Flow Analysis Algorithms by Ryder et al
Paper Study - Incremental Data-Flow Analysis Algorithms by Ryder et alPaper Study - Incremental Data-Flow Analysis Algorithms by Ryder et al
Paper Study - Incremental Data-Flow Analysis Algorithms by Ryder et alMin-Yih Hsu
 
NIACFDS2015-09-29_HiroNishikawa_HNS20
NIACFDS2015-09-29_HiroNishikawa_HNS20NIACFDS2015-09-29_HiroNishikawa_HNS20
NIACFDS2015-09-29_HiroNishikawa_HNS20Hiroaki Nishikawa
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignmentSanaym
 
Aaex5 group2(中英夾雜)
Aaex5 group2(中英夾雜)Aaex5 group2(中英夾雜)
Aaex5 group2(中英夾雜)Shiang-Yun Yang
 
One Problem, Two Structures, Six Solvers and Ten Years of Personnel Schedulin...
One Problem, Two Structures, Six Solvers and Ten Years of Personnel Schedulin...One Problem, Two Structures, Six Solvers and Ten Years of Personnel Schedulin...
One Problem, Two Structures, Six Solvers and Ten Years of Personnel Schedulin...Pierre Schaus
 
Scalable membership management
Scalable membership management Scalable membership management
Scalable membership management Vinay Setty
 
Node Unique Label Cover
Node Unique Label CoverNode Unique Label Cover
Node Unique Label Covermsramanujan
 
DESIGN AND IMPLEMENTATION OF AREA AND POWER OPTIMISED NOVEL SCANFLOP
DESIGN AND IMPLEMENTATION OF AREA AND POWER OPTIMISED NOVEL SCANFLOPDESIGN AND IMPLEMENTATION OF AREA AND POWER OPTIMISED NOVEL SCANFLOP
DESIGN AND IMPLEMENTATION OF AREA AND POWER OPTIMISED NOVEL SCANFLOPVLSICS Design
 
What might a spoken corpus tell us about language
What might a spoken corpus tell us about languageWhat might a spoken corpus tell us about language
What might a spoken corpus tell us about languageUCLDH
 
3341903 tom lab_manual_prepared by mvp & vhh
3341903 tom lab_manual_prepared by mvp & vhh3341903 tom lab_manual_prepared by mvp & vhh
3341903 tom lab_manual_prepared by mvp & vhhVipul Hingu
 
Loop parallelization & pipelining
Loop parallelization & pipeliningLoop parallelization & pipelining
Loop parallelization & pipeliningjagrat123
 

Similaire à Multiple alignment (20)

Ch06 multalign
Ch06 multalignCh06 multalign
Ch06 multalign
 
Paper Study - Incremental Data-Flow Analysis Algorithms by Ryder et al
Paper Study - Incremental Data-Flow Analysis Algorithms by Ryder et alPaper Study - Incremental Data-Flow Analysis Algorithms by Ryder et al
Paper Study - Incremental Data-Flow Analysis Algorithms by Ryder et al
 
NIACFDS2015-09-29_HiroNishikawa_HNS20
NIACFDS2015-09-29_HiroNishikawa_HNS20NIACFDS2015-09-29_HiroNishikawa_HNS20
NIACFDS2015-09-29_HiroNishikawa_HNS20
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Aaex5 group2(中英夾雜)
Aaex5 group2(中英夾雜)Aaex5 group2(中英夾雜)
Aaex5 group2(中英夾雜)
 
Bioinformatics lesson
Bioinformatics lessonBioinformatics lesson
Bioinformatics lesson
 
Bioinformatics lesson
Bioinformatics lessonBioinformatics lesson
Bioinformatics lesson
 
One Problem, Two Structures, Six Solvers and Ten Years of Personnel Schedulin...
One Problem, Two Structures, Six Solvers and Ten Years of Personnel Schedulin...One Problem, Two Structures, Six Solvers and Ten Years of Personnel Schedulin...
One Problem, Two Structures, Six Solvers and Ten Years of Personnel Schedulin...
 
Scalable membership management
Scalable membership management Scalable membership management
Scalable membership management
 
Node Unique Label Cover
Node Unique Label CoverNode Unique Label Cover
Node Unique Label Cover
 
DESIGN AND IMPLEMENTATION OF AREA AND POWER OPTIMISED NOVEL SCANFLOP
DESIGN AND IMPLEMENTATION OF AREA AND POWER OPTIMISED NOVEL SCANFLOPDESIGN AND IMPLEMENTATION OF AREA AND POWER OPTIMISED NOVEL SCANFLOP
DESIGN AND IMPLEMENTATION OF AREA AND POWER OPTIMISED NOVEL SCANFLOP
 
A superglue for string comparison
A superglue for string comparisonA superglue for string comparison
A superglue for string comparison
 
Loops_in_Rv1.2b
Loops_in_Rv1.2bLoops_in_Rv1.2b
Loops_in_Rv1.2b
 
Stats chapter 3
Stats chapter 3Stats chapter 3
Stats chapter 3
 
Elhabian_curves10.pdf
Elhabian_curves10.pdfElhabian_curves10.pdf
Elhabian_curves10.pdf
 
sorting
sortingsorting
sorting
 
What might a spoken corpus tell us about language
What might a spoken corpus tell us about languageWhat might a spoken corpus tell us about language
What might a spoken corpus tell us about language
 
3341903 tom lab_manual_prepared by mvp & vhh
3341903 tom lab_manual_prepared by mvp & vhh3341903 tom lab_manual_prepared by mvp & vhh
3341903 tom lab_manual_prepared by mvp & vhh
 
ACF.ppt
ACF.pptACF.ppt
ACF.ppt
 
Loop parallelization & pipelining
Loop parallelization & pipeliningLoop parallelization & pipelining
Loop parallelization & pipelining
 

Plus de avrilcoghlan

DESeq Paper Journal club
DESeq Paper Journal club DESeq Paper Journal club
DESeq Paper Journal club avrilcoghlan
 
Introduction to genomes
Introduction to genomesIntroduction to genomes
Introduction to genomesavrilcoghlan
 
Statistical significance of alignments
Statistical significance of alignmentsStatistical significance of alignments
Statistical significance of alignmentsavrilcoghlan
 
The Smith Waterman algorithm
The Smith Waterman algorithmThe Smith Waterman algorithm
The Smith Waterman algorithmavrilcoghlan
 
Alignment scoring functions
Alignment scoring functionsAlignment scoring functions
Alignment scoring functionsavrilcoghlan
 
The Needleman Wunsch algorithm
The Needleman Wunsch algorithmThe Needleman Wunsch algorithm
The Needleman Wunsch algorithmavrilcoghlan
 
Dotplots for Bioinformatics
Dotplots for BioinformaticsDotplots for Bioinformatics
Dotplots for Bioinformaticsavrilcoghlan
 
Introduction to HMMs in Bioinformatics
Introduction to HMMs in BioinformaticsIntroduction to HMMs in Bioinformatics
Introduction to HMMs in Bioinformaticsavrilcoghlan
 

Plus de avrilcoghlan (10)

DESeq Paper Journal club
DESeq Paper Journal club DESeq Paper Journal club
DESeq Paper Journal club
 
Introduction to genomes
Introduction to genomesIntroduction to genomes
Introduction to genomes
 
Homology
HomologyHomology
Homology
 
Statistical significance of alignments
Statistical significance of alignmentsStatistical significance of alignments
Statistical significance of alignments
 
BLAST
BLASTBLAST
BLAST
 
The Smith Waterman algorithm
The Smith Waterman algorithmThe Smith Waterman algorithm
The Smith Waterman algorithm
 
Alignment scoring functions
Alignment scoring functionsAlignment scoring functions
Alignment scoring functions
 
The Needleman Wunsch algorithm
The Needleman Wunsch algorithmThe Needleman Wunsch algorithm
The Needleman Wunsch algorithm
 
Dotplots for Bioinformatics
Dotplots for BioinformaticsDotplots for Bioinformatics
Dotplots for Bioinformatics
 
Introduction to HMMs in Bioinformatics
Introduction to HMMs in BioinformaticsIntroduction to HMMs in Bioinformatics
Introduction to HMMs in Bioinformatics
 

Dernier

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdfssuserdda66b
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 

Dernier (20)

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 

Multiple alignment

  • 1. Multiple Alignment Dr Avril Coghlan alc@sanger.ac.uk Note: this talk contains animations which can only be seen by downloading and using ‘View Slide show’ in Powerpoint
  • 2. Pairwise versus Multiple Alignment • So far we have considered the alignment of two sequences (‘pairwise alignment’) Q K E S G P S S S Y C | | | | | V Q Q E S G L V R T T C • Alignment can be performed between three or more sequences (‘multiple alignment’) Q K E S G P S S S Y C | | | | | V Q Q E S G L V R T T C | | | | | | | | V Q K E S L L V R S T C
  • 3. Multiple alignment • Multiple alignments are useful for comparing many homologous sequences at once Multiple alignment of part of Eyeless from different animals • Multiple alignments can be global or local The majority of widely used programs for making multiple alignments (eg. CLUSTAL, T-COFFEE) create global multiple alignments (not local multiple alignments) If the sequences share one stretch of high sequence similarity, it might make sense to make a multiple alignment of just that region of similarity eg. for Eyeless You can “cut out” the region of similarity from each sequence, & make a multiple alignment of that region eg. using CLUSTAL
  • 4. Real data: Eyeless proteins Do you think it’s sensible to make a global multiple alignment of these sequences?
  • 5. The alignment is not very reliable in regions of low similarity for example look at the alignment of fly Eyeless to the other proteins here
  • 6. Algorithms for aligning 2 sequences (eg. N-W, S-W) can be extended to multiple sequences For aligning 3 sequences using N-W, we fill in a table T that is a 3D cube, using the recurrence relation: T(i-1,j-1,k-1) + σ(S1(i),S2(j)) + σ(S1(i),S3(k)) + σ(S2(j),S3(k)) T(i, j, k) = max T(i-1, j, k) + gap penalty + gap penalty T(i, j-1, k) + gap penalty + gap penalty T(i, j, k-1) + gap penalty + gap penalty T(i-1, j, k-1) + σ(S1(i),S3(k)) + gap penalty + gap penalty T(i, j-1, k-1) + σ(S2(j),S3(k)) + gap penalty + gap penalty T(i-1, j-1, k) + σ(S1(i),S2(j)) + gap penalty + gap penalty
  • 7. • The run-time increases exponentially with the number of sequences you want to align Aligning 4 sequences of 100 amino acids takes ~3 days! • Heuristic algorithms for multiple alignment are generally used, as they are fast eg. CLUSTAL, T-COFFEE ‘Heuristic’ means they’re not guaranteed to find the best solution (best alignment here) (While N-W & S-W are proven to find the best alignment) • A popular heuristic algorithm is CLUSTAL, by Des Higgins and Paul Sharp at Trinity College Dublin (1988) Uses a ‘progressive alignment’ approach ie. aligns the most similar 2 sequences first; adds the next most similar sequence to that alignment; adds the next most similar sequence … etc.
  • 8. CLUSTAL • A popular heuristic algorithm is CLUSTAL, by Des Higgins and Paul Sharp at TCD (1988) Cited >37,000 times; D. Higgins is Ireland’s most cited scientist • CLUSTAL makes a global multiple alignment using a ‘progressive alignment’ approach • First computes all pairwise alignments and calculates sequence similarity between pairs • These similarities are used to build a rough ‘guide tree’ S1 S2 S3 S4
  • 9. • 1 Then aligns the most similar pair of sequences This gives us an alignment of 2 sequences (called a ‘profile’) eg. alignment of sequences S1 and S2 • 2 Aligns the next closest pair of sequences (or pair of profiles, or sequence and profile) eg. alignment of sequences S1 and S2 • 3 Aligns the next closest pair of seqs/profiles eg. alignment of profiles S1-S2 and S3-S4 MQTIF S1 MQTIF LH-IW 1 MQTIF LHIW S2 LH-IW LQS-W 3 LQSW L-S-F LQSW S3 2 L-SF LSF S4
  • 10. • A property of this method is that gap creation is irreversible: ‘once a gap, always a gap’ MQTIF S1 MQTIF LH-IW 1 MQTIF LHIW S2 LH-IW LQS-W 3 LQSW L-S-F LQSW S3 2 L-SF LSF S4 • This is a ‘heuristic algorithm’, ie. is not guaranteed to give the best alignment However, is very fast & works well in most cases
  • 11. Software for making alignments • For multiple alignment (heuristic programs) CLUSTAL http://www.ebi.ac.uk/Tools/msa/clustalw2/ T-COFFEE http://tcoffee.vital-it.ch/cgi-bin/Tcoffee/tcoffee_cgi/index.cgi MUSCLE http://www.ebi.ac.uk/Tools/msa/muscle/ MAFFT http://mafft.cbrc.jp/alignment/software/
  • 12. Further Reading • Chapter 3 in Introduction to Computational Genomics Cristianini & Hahn • Chapter 6 in Deonier et al book Computational Genome Analysis • Practical on multiple alignment in R in the Little Book of R for Bioinformatics: https://a-little-book-of-r-for- bioinformatics.readthedocs.org/en/latest/src/chapter5.html

Notes de l'éditeur

  1. Mouse sequence from: http://www.treefam.org/cgi-bin/TFseq.pl?id=ENSMUST00000111083.1 Chicken from: http://www.treefam.org/cgi-bin/TFseq.pl?id=ENSGALT00000019805.3 Seasquirt from: http://www.treefam.org/cgi-bin/TFseq.pl?id=ENSCINT00000013350.2 Human Eyeless (PAX6) from: http://www.treefam.org/cgi-bin/TFseq.pl?id=ENST00000379111.1 D. Melanogaster Eyeless from: http://www.treefam.org/cgi-bin/TFseq.pl?id=FBtr0100396.5 Aligned using clustalw. Viewed in Jalview. Saved as humanflyothers_clustal.png
  2. Mouse sequence from: http://www.treefam.org/cgi-bin/TFseq.pl?id=ENSMUST00000111083.1 Chicken from: http://www.treefam.org/cgi-bin/TFseq.pl?id=ENSGALT00000019805.3 Seasquirt from: http://www.treefam.org/cgi-bin/TFseq.pl?id=ENSCINT00000013350.2 Human Eyeless (PAX6) from: http://www.treefam.org/cgi-bin/TFseq.pl?id=ENST00000379111.1 D. Melanogaster Eyeless from: http://www.treefam.org/cgi-bin/TFseq.pl?id=FBtr0100396.5 Aligned using clustalw. Viewed in Jalview. Saved as humanflyothers_clustal.png
  3. Mouse sequence from: http://www.treefam.org/cgi-bin/TFseq.pl?id=ENSMUST00000111083.1 Chicken from: http://www.treefam.org/cgi-bin/TFseq.pl?id=ENSGALT00000019805.3 Seasquirt from: http://www.treefam.org/cgi-bin/TFseq.pl?id=ENSCINT00000013350.2 Human Eyeless (PAX6) from: http://www.treefam.org/cgi-bin/TFseq.pl?id=ENST00000379111.1 D. Melanogaster Eyeless from: http://www.treefam.org/cgi-bin/TFseq.pl?id=FBtr0100396.5 Aligned using clustalw. Viewed in Jalview. Saved as humanflyothers_clustal.png
  4. Image from www.cs.iastate.edu/~cs544/.../Multiple_Sequence_Alignment.ppt slide 12 For recurrence relation, see page 189 in Jones & Pevzner ‘An introduction to bioinformatics algorithms’
  5. Image credit (Des Higgins): http://www.idaireland.com/_internal/cimg!0/52302eob2zw6kiy4ed60bl5ugmuau17 Image credit (Paul Sharp): http://www.biology.ed.ac.uk/people/homepages/images/pmsharp.jpg
  6. Image credit: http://www.biomedcentral.com/content/figures/1471-2105-5-113-1-l.jpg
  7. Image credit: http://www.biomedcentral.com/content/figures/1471-2105-5-113-1-l.jpg