SlideShare une entreprise Scribd logo
1  sur  36
Télécharger pour lire hors ligne
Short introduction to Bioinformatics
             What are the Probabilistic Models?
                            Sequence Alignment
                             Pairwise Alignment
            Multiple Sequence Alignment Models
                         What is Phylogenetics?
                     Building Phylogenetic Trees
                                   Other Models
                                    Conctact Us




Introduction to Probabilistic Models for Bioinformatics

              Igor Bogicevic (igor.bogicevic@sbgenomics.com)




                                          July 3, 2011




                                                                                                         EVEN BRIDGES
                                                                                                             G E N O M I C S, LLC




  Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


Short introduction to Bioinformatics




       Bioinformatics is the application of statistics and computer science to the field of
       molecular biology.




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


Short introduction to Bioinformatics




       Bioinformatics is the application of statistics and computer science to the field of
       molecular biology.
       Major research efforts in the field include sequence alignment, gene finding,
       genome assembly, drug design, drug discovery, protein structure alignment,
       protein structure prediction, prediction of gene expression and protein-protein
       interactions, genome-wide association studies and the modeling of evolution.




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


Short introduction to Bioinformatics




       Bioinformatics is the application of statistics and computer science to the field of
       molecular biology.
       Major research efforts in the field include sequence alignment, gene finding,
       genome assembly, drug design, drug discovery, protein structure alignment,
       protein structure prediction, prediction of gene expression and protein-protein
       interactions, genome-wide association studies and the modeling of evolution.
       At the current moment, given the enormous volumes of sequenced data, one of
       the biggest challenges is not producing, but actually understanding the data.




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


What are the Probabilistic Models?

       There are 2 basic definitions:




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


What are the Probabilistic Models?

       There are 2 basic definitions:
       Statistical analysis tool that estimates, on the basis of past (historical) data, the
       probability of an event occurring again.
       Probabilistic model is a system that simulates the object under the consideration
       and produces different outcomes with different probabilities.




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


What are the Probabilistic Models?

       There are 2 basic definitions:
       Statistical analysis tool that estimates, on the basis of past (historical) data, the
       probability of an event occurring again.
       Probabilistic model is a system that simulates the object under the consideration
       and produces different outcomes with different probabilities.
       Simple example - rolling a die.




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


What are the Probabilistic Models?

       There are 2 basic definitions:
       Statistical analysis tool that estimates, on the basis of past (historical) data, the
       probability of an event occurring again.
       Probabilistic model is a system that simulates the object under the consideration
       and produces different outcomes with different probabilities.
       Simple example - rolling a die.
       A bit more relevant example - random sequence model in DNA .
       Biological sequences are strings from a finite alphabet of residues, most
       commonly either four nucleotides, or twenty amino acids.
       Imagine that a residue a occurs with probability qa , if protein or DNA sequence is
       denoted x1 ...xn , then probability of the whole sequence is:
                                                                     n
                                                                     Y
                                                  qx1 qx2 ...qxn =         qxi
                                                                     i=1
                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                       What are the Probabilistic Models?
                                      Sequence Alignment
                                       Pairwise Alignment
                      Multiple Sequence Alignment Models
                                   What is Phylogenetics?
                               Building Phylogenetic Trees
                                             Other Models
                                              Conctact Us


Sequence Alignment




       Sequence alignment is a way of arranging the sequences of DNA, RNA, or protein
       to identify regions of similarity that may be a consequence of functional,
       structural, or evolutionary relationships between the sequences.




                                                                                                                   EVEN BRIDGES
                                                                                                                       G E N O M I C S, LLC




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


Sequence Alignment




       Sequence alignment is a way of arranging the sequences of DNA, RNA, or protein
       to identify regions of similarity that may be a consequence of functional,
       structural, or evolutionary relationships between the sequences.
       A variety of computational algorithms have been applied to the sequence
       alignment problem, i.e. dynamic programming, heuristic algorithms, probabilistic
       methods.




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


Sequence Alignment




       Sequence alignment is a way of arranging the sequences of DNA, RNA, or protein
       to identify regions of similarity that may be a consequence of functional,
       structural, or evolutionary relationships between the sequences.
       A variety of computational algorithms have been applied to the sequence
       alignment problem, i.e. dynamic programming, heuristic algorithms, probabilistic
       methods.
       Common formats for representing alignments are FASTA and GenBank format




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
           What are the Probabilistic Models?
                          Sequence Alignment
                           Pairwise Alignment
          Multiple Sequence Alignment Models
                       What is Phylogenetics?
                   Building Phylogenetic Trees
                                 Other Models
                                  Conctact Us




                                                                                                       EVEN BRIDGES
                                                                                                           G E N O M I C S, LLC




Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                       What are the Probabilistic Models?
                                      Sequence Alignment
                                       Pairwise Alignment
                      Multiple Sequence Alignment Models
                                   What is Phylogenetics?
                               Building Phylogenetic Trees
                                             Other Models
                                              Conctact Us


Pairwise Alignment


       Pairwise sequence alignment methods are used to find the best-matching
       piecewise (local) or global alignments of two query sequences.




                                                                                                                   EVEN BRIDGES
                                                                                                                       G E N O M I C S, LLC




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                       What are the Probabilistic Models?
                                      Sequence Alignment
                                       Pairwise Alignment
                      Multiple Sequence Alignment Models
                                   What is Phylogenetics?
                               Building Phylogenetic Trees
                                             Other Models
                                              Conctact Us


Pairwise Alignment


       Pairwise sequence alignment methods are used to find the best-matching
       piecewise (local) or global alignments of two query sequences.
       The three primary methods of producing pairwise alignments are dot-matrix
       methods, dynamic programming, and word methods.




                                                                                                                   EVEN BRIDGES
                                                                                                                       G E N O M I C S, LLC




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                       What are the Probabilistic Models?
                                      Sequence Alignment
                                       Pairwise Alignment
                      Multiple Sequence Alignment Models
                                   What is Phylogenetics?
                               Building Phylogenetic Trees
                                             Other Models
                                              Conctact Us


Pairwise Alignment


       Pairwise sequence alignment methods are used to find the best-matching
       piecewise (local) or global alignments of two query sequences.
       The three primary methods of producing pairwise alignments are dot-matrix
       methods, dynamic programming, and word methods.
       Needleman-Wunsch algorithm (Global Alignment)




                                                                                                                   EVEN BRIDGES
                                                                                                                       G E N O M I C S, LLC




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                       What are the Probabilistic Models?
                                      Sequence Alignment
                                       Pairwise Alignment
                      Multiple Sequence Alignment Models
                                   What is Phylogenetics?
                               Building Phylogenetic Trees
                                             Other Models
                                              Conctact Us


Pairwise Alignment


       Pairwise sequence alignment methods are used to find the best-matching
       piecewise (local) or global alignments of two query sequences.
       The three primary methods of producing pairwise alignments are dot-matrix
       methods, dynamic programming, and word methods.
       Needleman-Wunsch algorithm (Global Alignment)
       Smith-Waterman algorithm (Local Alignment)




                                                                                                                   EVEN BRIDGES
                                                                                                                       G E N O M I C S, LLC




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                       What are the Probabilistic Models?
                                      Sequence Alignment
                                       Pairwise Alignment
                      Multiple Sequence Alignment Models
                                   What is Phylogenetics?
                               Building Phylogenetic Trees
                                             Other Models
                                              Conctact Us


Pairwise Alignment


       Pairwise sequence alignment methods are used to find the best-matching
       piecewise (local) or global alignments of two query sequences.
       The three primary methods of producing pairwise alignments are dot-matrix
       methods, dynamic programming, and word methods.
       Needleman-Wunsch algorithm (Global Alignment)
       Smith-Waterman algorithm (Local Alignment)
       FASTA/BLAST Algorithms (k-tuple heuristic methods, often combined with
       dynamic models)




                                                                                                                   EVEN BRIDGES
                                                                                                                       G E N O M I C S, LLC




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                       What are the Probabilistic Models?
                                      Sequence Alignment
                                       Pairwise Alignment
                      Multiple Sequence Alignment Models
                                   What is Phylogenetics?
                               Building Phylogenetic Trees
                                             Other Models
                                              Conctact Us


Pairwise Alignment


       Pairwise sequence alignment methods are used to find the best-matching
       piecewise (local) or global alignments of two query sequences.
       The three primary methods of producing pairwise alignments are dot-matrix
       methods, dynamic programming, and word methods.
       Needleman-Wunsch algorithm (Global Alignment)
       Smith-Waterman algorithm (Local Alignment)
       FASTA/BLAST Algorithms (k-tuple heuristic methods, often combined with
       dynamic models)
       Gap Penalities - modeling a cost of a gap in matched sequences (linear, affine,
       etc.)



                                                                                                                   EVEN BRIDGES
                                                                                                                       G E N O M I C S, LLC




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                 What are the Probabilistic Models?
                                Sequence Alignment
                                 Pairwise Alignment
                Multiple Sequence Alignment Models
                             What is Phylogenetics?
                         Building Phylogenetic Trees
                                       Other Models
                                        Conctact Us




Example - Smith-Waterman: A matrix H is built as follows:

                                         H(i, 0) = 0, 0 ≤ i ≤ m
                                         H(0, j) = 0, 0 ≤ j ≤ n


                               if ai = bj then w (ai , bj ) = w (match)
                          or if ai ! = bj then w (ai , bj ) = w (mismatch)

                  8                                                          9
                  >
                  >          0                                               >
                                                                             >
                H(i − 1, j − 1) + w (ai , bj )                 Match/Mismatch
                  <                                                          =
H(i, j) = max                                                                  , 1 ≤ i ≤ m, 1 ≤ j ≤ n
              > H(i − 1, j) + w (ai , −)
              >                                                   Deletion   >
                                                                             >
                 H(i, j − 1) + w (−, bj )                         Insertion
              :                                                              ;



                                                                                                             EVEN BRIDGES
                                                                                                                 G E N O M I C S, LLC




      Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
               What are the Probabilistic Models?
                              Sequence Alignment
                               Pairwise Alignment
              Multiple Sequence Alignment Models
                           What is Phylogenetics?
                       Building Phylogenetic Trees
                                     Other Models
                                      Conctact Us



Sequence 1 = ACACACTA, Sequence 2 = AGCACACA




                                                                                                           EVEN BRIDGES
                                                                                                               G E N O M I C S, LLC




    Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                What are the Probabilistic Models?
                               Sequence Alignment
                                Pairwise Alignment
               Multiple Sequence Alignment Models
                            What is Phylogenetics?
                        Building Phylogenetic Trees
                                      Other Models
                                       Conctact Us



Sequence 1 = ACACACTA, Sequence 2 = AGCACACA
w(match) = +2
w(a,-) = w(-,b) = w(mismatch) = -1

                                  −      A      C     A       C      A       C        T       A
                        0                                                                       1
                  B−              0      0      0     0       0      0        0        0      0C
                  BA              0      2      1     2       1      2        1        0      2C
                  B                                                                             C
                  BG              0      1      1     1       1      1        1        0      1C
                  B                                                                             C
                  BC              0      0      3     2       3      2        3        2      1C
                  B                                                                             C
                H=B
                  BA              0      2      2     5       4      5        4        3      4C
                                                                                                C
                  BC              0      1      4     4       7      6        7        6      5C
                  B                                                                             C
                  BA              0      2      3     6       6      9        8        7      8C
                  B                                                                             C
                  @C              0      1      4     5       8      8       11       10       9A
                    A             0      2      3     6       7      10      10       10      12




                                                                                                                EVEN BRIDGES
                                                                                                                    G E N O M I C S, LLC




     Igor Bogicevic (igor.bogicevic@sbgenomics.com)       Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                 What are the Probabilistic Models?
                                Sequence Alignment
                                 Pairwise Alignment
                Multiple Sequence Alignment Models
                             What is Phylogenetics?
                         Building Phylogenetic Trees
                                       Other Models
                                        Conctact Us



Sequence 1 = ACACACTA, Sequence 2 = AGCACACA
w(match) = +2
w(a,-) = w(-,b) = w(mismatch) = -1

                                   −      A      C     A       C      A       C        T       A
                         0                                                                       1
                   B−              0      0      0     0       0      0        0        0      0C
                   BA              0      2      1     2       1      2        1        0      2C
                   B                                                                             C
                   BG              0      1      1     1       1      1        1        0      1C
                   B                                                                             C
                   BC              0      0      3     2       3      2        3        2      1C
                   B                                                                             C
                 H=B
                   BA              0      2      2     5       4      5        4        3      4C
                                                                                                 C
                   BC              0      1      4     4       7      6        7        6      5C
                   B                                                                             C
                   BA              0      2      3     6       6      9        8        7      8C
                   B                                                                             C
                   @C              0      1      4     5       8      8       11       10       9A
                     A             0      2      3     6       7      10      10       10      12

In the example, the highest value corresponds to the cell in position (8,8). The
walk back corresponds to (8,8), (7,7), (7,6), (6,5), (5,4), (4,3), (3,2), (2,1),
(1,1), and (0,0)
Sequence 1 = A-CACACTA, Sequence 2 = AGCACAC-A                                                                   EVEN BRIDGES
                                                                                                                     G E N O M I C S, LLC




      Igor Bogicevic (igor.bogicevic@sbgenomics.com)       Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                       What are the Probabilistic Models?
                                      Sequence Alignment
                                       Pairwise Alignment
                      Multiple Sequence Alignment Models
                                   What is Phylogenetics?
                               Building Phylogenetic Trees
                                             Other Models
                                              Conctact Us


Multiple Sequence Alignment Models



       A multiple sequence alignment (MSA) is a sequence alignment of three or more
       biological sequences, commonly protein, DNA, or RNA.




                                                                                                                   EVEN BRIDGES
                                                                                                                       G E N O M I C S, LLC




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


Multiple Sequence Alignment Models



       A multiple sequence alignment (MSA) is a sequence alignment of three or more
       biological sequences, commonly protein, DNA, or RNA.
       We usually want to do multiple alignments to find a homologous sequences that
       point to a shared evolutionary origins that can be used for further phylogenetic
       analysis.
       Progressive Alignment Methods - constructing succession of a pairwise alignment.




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


Multiple Sequence Alignment Models



       A multiple sequence alignment (MSA) is a sequence alignment of three or more
       biological sequences, commonly protein, DNA, or RNA.
       We usually want to do multiple alignments to find a homologous sequences that
       point to a shared evolutionary origins that can be used for further phylogenetic
       analysis.
       Progressive Alignment Methods - constructing succession of a pairwise alignment.
       Hidden Markov Models - representation of MSA as DAG, observed states are
       individual alignment columns and the hidden states represent the presumed
       ancestral sequence.




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
           What are the Probabilistic Models?
                          Sequence Alignment
                           Pairwise Alignment
          Multiple Sequence Alignment Models
                       What is Phylogenetics?
                   Building Phylogenetic Trees
                                 Other Models
                                  Conctact Us




                                                                                                       EVEN BRIDGES
                                                                                                           G E N O M I C S, LLC




Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


What is Phylogenetics?



       Phylogenetics is the study of evolutionary relatedness among groups of organisms
       (e.g. species, populations), which is discovered through molecular sequencing
       data and morphological data matrices.




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


What is Phylogenetics?



       Phylogenetics is the study of evolutionary relatedness among groups of organisms
       (e.g. species, populations), which is discovered through molecular sequencing
       data and morphological data matrices.
       Evolution is regarded as a branching process, whereby populations are altered
       over time and may speciate into separate branches, hybridize together, or
       terminate by extinction. This may be visualized in a phylogenetic tree.




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


What is Phylogenetics?



       Phylogenetics is the study of evolutionary relatedness among groups of organisms
       (e.g. species, populations), which is discovered through molecular sequencing
       data and morphological data matrices.
       Evolution is regarded as a branching process, whereby populations are altered
       over time and may speciate into separate branches, hybridize together, or
       terminate by extinction. This may be visualized in a phylogenetic tree.
       Ernst Haeckel’s recapitulation theory (”ontogeny recapitulates phylogeny”) is a
       hypothesis that in developing from embryo to adult, animals go through stages
       resembling or representing successive stages in the evolution of their remote
       ancestors.



                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                       What are the Probabilistic Models?
                                      Sequence Alignment
                                       Pairwise Alignment
                      Multiple Sequence Alignment Models
                                   What is Phylogenetics?
                               Building Phylogenetic Trees
                                             Other Models
                                              Conctact Us


Building Phylogenetic Trees


       Phylogenetic trees among a nontrivial number of input sequences are constructed
       using computational phylogenetics methods.




                                                                                                                   EVEN BRIDGES
                                                                                                                       G E N O M I C S, LLC




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                       What are the Probabilistic Models?
                                      Sequence Alignment
                                       Pairwise Alignment
                      Multiple Sequence Alignment Models
                                   What is Phylogenetics?
                               Building Phylogenetic Trees
                                             Other Models
                                              Conctact Us


Building Phylogenetic Trees


       Phylogenetic trees among a nontrivial number of input sequences are constructed
       using computational phylogenetics methods.
       Common method is to search for maximum likelihood, often within a Bayesian
       Framework, and apply an explicit model of evolution to phylogenetic tree
       estimation.




                                                                                                                   EVEN BRIDGES
                                                                                                                       G E N O M I C S, LLC




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


Building Phylogenetic Trees


       Phylogenetic trees among a nontrivial number of input sequences are constructed
       using computational phylogenetics methods.
       Common method is to search for maximum likelihood, often within a Bayesian
       Framework, and apply an explicit model of evolution to phylogenetic tree
       estimation.
       Identifying the optimal tree using many of these techniques is NP-hard, so
       heuristic search and optimization methods are used in combination with
       tree-scoring functions to identify a reasonably good tree that fits the data.




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


Building Phylogenetic Trees


       Phylogenetic trees among a nontrivial number of input sequences are constructed
       using computational phylogenetics methods.
       Common method is to search for maximum likelihood, often within a Bayesian
       Framework, and apply an explicit model of evolution to phylogenetic tree
       estimation.
       Identifying the optimal tree using many of these techniques is NP-hard, so
       heuristic search and optimization methods are used in combination with
       tree-scoring functions to identify a reasonably good tree that fits the data.
       They do not necessarily accurately represent the species evolutionary history as
       the data on which they are based is noisy; the analysis can be confounded by
       horizontal gene transfer, hybridisation between species that were not nearest
       neighbors on the tree before hybridisation takes place, convergent evolution, and
       conserved sequences.

                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
           What are the Probabilistic Models?
                          Sequence Alignment
                           Pairwise Alignment
          Multiple Sequence Alignment Models
                       What is Phylogenetics?
                   Building Phylogenetic Trees
                                 Other Models
                                  Conctact Us




                                                                                                       EVEN BRIDGES
                                                                                                           G E N O M I C S, LLC




Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                       What are the Probabilistic Models?
                                      Sequence Alignment
                                       Pairwise Alignment
                      Multiple Sequence Alignment Models
                                   What is Phylogenetics?
                               Building Phylogenetic Trees
                                             Other Models
                                              Conctact Us


Other Models




       Transformational Grammars (Chomsky Hierarchy)
       RNA Structure Analysis Models (RNA contains the interactions - rather than
       preserving the sequence)




                                                                                                                   EVEN BRIDGES
                                                                                                                       G E N O M I C S, LLC




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


Contact Us




       We are Hiring!




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics

Contenu connexe

Tendances

Linear models and multiclass classification
Linear models and multiclass classificationLinear models and multiclass classification
Linear models and multiclass classification
NdSv94
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Simplilearn
 
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Simplilearn
 

Tendances (20)

Introduction to Statistical Machine Learning
Introduction to Statistical Machine LearningIntroduction to Statistical Machine Learning
Introduction to Statistical Machine Learning
 
Logistic Regression Analysis
Logistic Regression AnalysisLogistic Regression Analysis
Logistic Regression Analysis
 
Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithm
 
Bayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionBayesian Networks - A Brief Introduction
Bayesian Networks - A Brief Introduction
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Linear Regression vs Logistic Regression vs Poisson Regression
Linear Regression vs Logistic Regression vs Poisson RegressionLinear Regression vs Logistic Regression vs Poisson Regression
Linear Regression vs Logistic Regression vs Poisson Regression
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Linear models and multiclass classification
Linear models and multiclass classificationLinear models and multiclass classification
Linear models and multiclass classification
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and Regression
 
Machine learning Lecture 1
Machine learning Lecture 1Machine learning Lecture 1
Machine learning Lecture 1
 
Logistic Regression.ppt
Logistic Regression.pptLogistic Regression.ppt
Logistic Regression.ppt
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
 
Introduction to Principle Component Analysis
Introduction to Principle Component AnalysisIntroduction to Principle Component Analysis
Introduction to Principle Component Analysis
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear Regression
 
KNN
KNN KNN
KNN
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
 

En vedette

Pairwise sequence alignment
Pairwise sequence alignmentPairwise sequence alignment
Pairwise sequence alignment
avrilcoghlan
 
TCS: A new multiple sequence alignment reliability measure to estimate align...
 TCS: A new multiple sequence alignment reliability measure to estimate align... TCS: A new multiple sequence alignment reliability measure to estimate align...
TCS: A new multiple sequence alignment reliability measure to estimate align...
JIA-MING CHANG
 
The Needleman Wunsch algorithm
The Needleman Wunsch algorithmThe Needleman Wunsch algorithm
The Needleman Wunsch algorithm
avrilcoghlan
 

En vedette (20)

Pairwise Alignment Course - Verify Your Cloning
Pairwise Alignment Course - Verify Your Cloning Pairwise Alignment Course - Verify Your Cloning
Pairwise Alignment Course - Verify Your Cloning
 
Sequence comparison techniques
Sequence comparison techniquesSequence comparison techniques
Sequence comparison techniques
 
Introduction to sequence alignment
Introduction to sequence alignmentIntroduction to sequence alignment
Introduction to sequence alignment
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Sequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsSequence Alignment In Bioinformatics
Sequence Alignment In Bioinformatics
 
Application of bioinformatics
Application of bioinformaticsApplication of bioinformatics
Application of bioinformatics
 
Pairwise sequence alignment
Pairwise sequence alignmentPairwise sequence alignment
Pairwise sequence alignment
 
2015 bioinformatics phylogenetics_wim_vancriekinge
2015 bioinformatics phylogenetics_wim_vancriekinge2015 bioinformatics phylogenetics_wim_vancriekinge
2015 bioinformatics phylogenetics_wim_vancriekinge
 
TCS: A new multiple sequence alignment reliability measure to estimate align...
 TCS: A new multiple sequence alignment reliability measure to estimate align... TCS: A new multiple sequence alignment reliability measure to estimate align...
TCS: A new multiple sequence alignment reliability measure to estimate align...
 
Phylogenetics2
Phylogenetics2Phylogenetics2
Phylogenetics2
 
Phylogenetics1
Phylogenetics1Phylogenetics1
Phylogenetics1
 
BIS2C. Biodiversity and the Tree of Life. 2014. L4. Inferring Phylogenetic Trees
BIS2C. Biodiversity and the Tree of Life. 2014. L4. Inferring Phylogenetic TreesBIS2C. Biodiversity and the Tree of Life. 2014. L4. Inferring Phylogenetic Trees
BIS2C. Biodiversity and the Tree of Life. 2014. L4. Inferring Phylogenetic Trees
 
Clustal X
Clustal XClustal X
Clustal X
 
The Needleman Wunsch algorithm
The Needleman Wunsch algorithmThe Needleman Wunsch algorithm
The Needleman Wunsch algorithm
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins
 
Hidden markov model
Hidden markov modelHidden markov model
Hidden markov model
 
Phylogeny
PhylogenyPhylogeny
Phylogeny
 
Lecture 7: Hidden Markov Models (HMMs)
Lecture 7: Hidden Markov Models (HMMs)Lecture 7: Hidden Markov Models (HMMs)
Lecture 7: Hidden Markov Models (HMMs)
 
Phylogenetic tree
Phylogenetic treePhylogenetic tree
Phylogenetic tree
 
Blast fasta 4
Blast fasta 4Blast fasta 4
Blast fasta 4
 

Similaire à Introduction to Probabilistic Models for Bioinformatics

My ontology is better than yours! Building and evaluating ontologies for inte...
My ontology is better than yours! Building and evaluating ontologies for inte...My ontology is better than yours! Building and evaluating ontologies for inte...
My ontology is better than yours! Building and evaluating ontologies for inte...
Robert Hoehndorf
 

Similaire à Introduction to Probabilistic Models for Bioinformatics (8)

Bioinformatica t1-bioinformatics
Bioinformatica t1-bioinformaticsBioinformatica t1-bioinformatics
Bioinformatica t1-bioinformatics
 
Bio-ontologies in bioinformatics: Growing up challenges
Bio-ontologies in bioinformatics: Growing up challengesBio-ontologies in bioinformatics: Growing up challenges
Bio-ontologies in bioinformatics: Growing up challenges
 
HOMOLOGY MODELING.pptx.pdf
HOMOLOGY MODELING.pptx.pdfHOMOLOGY MODELING.pptx.pdf
HOMOLOGY MODELING.pptx.pdf
 
My ontology is better than yours! Building and evaluating ontologies for inte...
My ontology is better than yours! Building and evaluating ontologies for inte...My ontology is better than yours! Building and evaluating ontologies for inte...
My ontology is better than yours! Building and evaluating ontologies for inte...
 
Stephen Friend HHMI-Penn 2011-05-27
Stephen Friend HHMI-Penn 2011-05-27Stephen Friend HHMI-Penn 2011-05-27
Stephen Friend HHMI-Penn 2011-05-27
 
Biotechnology as Career Option 2012
Biotechnology as Career Option 2012Biotechnology as Career Option 2012
Biotechnology as Career Option 2012
 
Introduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdfIntroduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdf
 
Vicarious Systems at Singularity Summit 2011
Vicarious Systems at Singularity Summit 2011Vicarious Systems at Singularity Summit 2011
Vicarious Systems at Singularity Summit 2011
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Dernier (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Introduction to Probabilistic Models for Bioinformatics

  • 1. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Introduction to Probabilistic Models for Bioinformatics Igor Bogicevic (igor.bogicevic@sbgenomics.com) July 3, 2011 EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 2. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Short introduction to Bioinformatics Bioinformatics is the application of statistics and computer science to the field of molecular biology. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 3. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Short introduction to Bioinformatics Bioinformatics is the application of statistics and computer science to the field of molecular biology. Major research efforts in the field include sequence alignment, gene finding, genome assembly, drug design, drug discovery, protein structure alignment, protein structure prediction, prediction of gene expression and protein-protein interactions, genome-wide association studies and the modeling of evolution. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 4. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Short introduction to Bioinformatics Bioinformatics is the application of statistics and computer science to the field of molecular biology. Major research efforts in the field include sequence alignment, gene finding, genome assembly, drug design, drug discovery, protein structure alignment, protein structure prediction, prediction of gene expression and protein-protein interactions, genome-wide association studies and the modeling of evolution. At the current moment, given the enormous volumes of sequenced data, one of the biggest challenges is not producing, but actually understanding the data. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 5. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us What are the Probabilistic Models? There are 2 basic definitions: EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 6. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us What are the Probabilistic Models? There are 2 basic definitions: Statistical analysis tool that estimates, on the basis of past (historical) data, the probability of an event occurring again. Probabilistic model is a system that simulates the object under the consideration and produces different outcomes with different probabilities. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 7. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us What are the Probabilistic Models? There are 2 basic definitions: Statistical analysis tool that estimates, on the basis of past (historical) data, the probability of an event occurring again. Probabilistic model is a system that simulates the object under the consideration and produces different outcomes with different probabilities. Simple example - rolling a die. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 8. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us What are the Probabilistic Models? There are 2 basic definitions: Statistical analysis tool that estimates, on the basis of past (historical) data, the probability of an event occurring again. Probabilistic model is a system that simulates the object under the consideration and produces different outcomes with different probabilities. Simple example - rolling a die. A bit more relevant example - random sequence model in DNA . Biological sequences are strings from a finite alphabet of residues, most commonly either four nucleotides, or twenty amino acids. Imagine that a residue a occurs with probability qa , if protein or DNA sequence is denoted x1 ...xn , then probability of the whole sequence is: n Y qx1 qx2 ...qxn = qxi i=1 EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 9. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Sequence Alignment Sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 10. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Sequence Alignment Sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. A variety of computational algorithms have been applied to the sequence alignment problem, i.e. dynamic programming, heuristic algorithms, probabilistic methods. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 11. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Sequence Alignment Sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. A variety of computational algorithms have been applied to the sequence alignment problem, i.e. dynamic programming, heuristic algorithms, probabilistic methods. Common formats for representing alignments are FASTA and GenBank format EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 12. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 13. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Pairwise Alignment Pairwise sequence alignment methods are used to find the best-matching piecewise (local) or global alignments of two query sequences. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 14. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Pairwise Alignment Pairwise sequence alignment methods are used to find the best-matching piecewise (local) or global alignments of two query sequences. The three primary methods of producing pairwise alignments are dot-matrix methods, dynamic programming, and word methods. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 15. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Pairwise Alignment Pairwise sequence alignment methods are used to find the best-matching piecewise (local) or global alignments of two query sequences. The three primary methods of producing pairwise alignments are dot-matrix methods, dynamic programming, and word methods. Needleman-Wunsch algorithm (Global Alignment) EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 16. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Pairwise Alignment Pairwise sequence alignment methods are used to find the best-matching piecewise (local) or global alignments of two query sequences. The three primary methods of producing pairwise alignments are dot-matrix methods, dynamic programming, and word methods. Needleman-Wunsch algorithm (Global Alignment) Smith-Waterman algorithm (Local Alignment) EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 17. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Pairwise Alignment Pairwise sequence alignment methods are used to find the best-matching piecewise (local) or global alignments of two query sequences. The three primary methods of producing pairwise alignments are dot-matrix methods, dynamic programming, and word methods. Needleman-Wunsch algorithm (Global Alignment) Smith-Waterman algorithm (Local Alignment) FASTA/BLAST Algorithms (k-tuple heuristic methods, often combined with dynamic models) EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 18. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Pairwise Alignment Pairwise sequence alignment methods are used to find the best-matching piecewise (local) or global alignments of two query sequences. The three primary methods of producing pairwise alignments are dot-matrix methods, dynamic programming, and word methods. Needleman-Wunsch algorithm (Global Alignment) Smith-Waterman algorithm (Local Alignment) FASTA/BLAST Algorithms (k-tuple heuristic methods, often combined with dynamic models) Gap Penalities - modeling a cost of a gap in matched sequences (linear, affine, etc.) EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 19. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Example - Smith-Waterman: A matrix H is built as follows: H(i, 0) = 0, 0 ≤ i ≤ m H(0, j) = 0, 0 ≤ j ≤ n if ai = bj then w (ai , bj ) = w (match) or if ai ! = bj then w (ai , bj ) = w (mismatch) 8 9 > > 0 > > H(i − 1, j − 1) + w (ai , bj ) Match/Mismatch < = H(i, j) = max , 1 ≤ i ≤ m, 1 ≤ j ≤ n > H(i − 1, j) + w (ai , −) > Deletion > > H(i, j − 1) + w (−, bj ) Insertion : ; EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 20. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Sequence 1 = ACACACTA, Sequence 2 = AGCACACA EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 21. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Sequence 1 = ACACACTA, Sequence 2 = AGCACACA w(match) = +2 w(a,-) = w(-,b) = w(mismatch) = -1 − A C A C A C T A 0 1 B− 0 0 0 0 0 0 0 0 0C BA 0 2 1 2 1 2 1 0 2C B C BG 0 1 1 1 1 1 1 0 1C B C BC 0 0 3 2 3 2 3 2 1C B C H=B BA 0 2 2 5 4 5 4 3 4C C BC 0 1 4 4 7 6 7 6 5C B C BA 0 2 3 6 6 9 8 7 8C B C @C 0 1 4 5 8 8 11 10 9A A 0 2 3 6 7 10 10 10 12 EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 22. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Sequence 1 = ACACACTA, Sequence 2 = AGCACACA w(match) = +2 w(a,-) = w(-,b) = w(mismatch) = -1 − A C A C A C T A 0 1 B− 0 0 0 0 0 0 0 0 0C BA 0 2 1 2 1 2 1 0 2C B C BG 0 1 1 1 1 1 1 0 1C B C BC 0 0 3 2 3 2 3 2 1C B C H=B BA 0 2 2 5 4 5 4 3 4C C BC 0 1 4 4 7 6 7 6 5C B C BA 0 2 3 6 6 9 8 7 8C B C @C 0 1 4 5 8 8 11 10 9A A 0 2 3 6 7 10 10 10 12 In the example, the highest value corresponds to the cell in position (8,8). The walk back corresponds to (8,8), (7,7), (7,6), (6,5), (5,4), (4,3), (3,2), (2,1), (1,1), and (0,0) Sequence 1 = A-CACACTA, Sequence 2 = AGCACAC-A EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 23. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Multiple Sequence Alignment Models A multiple sequence alignment (MSA) is a sequence alignment of three or more biological sequences, commonly protein, DNA, or RNA. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 24. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Multiple Sequence Alignment Models A multiple sequence alignment (MSA) is a sequence alignment of three or more biological sequences, commonly protein, DNA, or RNA. We usually want to do multiple alignments to find a homologous sequences that point to a shared evolutionary origins that can be used for further phylogenetic analysis. Progressive Alignment Methods - constructing succession of a pairwise alignment. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 25. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Multiple Sequence Alignment Models A multiple sequence alignment (MSA) is a sequence alignment of three or more biological sequences, commonly protein, DNA, or RNA. We usually want to do multiple alignments to find a homologous sequences that point to a shared evolutionary origins that can be used for further phylogenetic analysis. Progressive Alignment Methods - constructing succession of a pairwise alignment. Hidden Markov Models - representation of MSA as DAG, observed states are individual alignment columns and the hidden states represent the presumed ancestral sequence. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 26. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 27. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us What is Phylogenetics? Phylogenetics is the study of evolutionary relatedness among groups of organisms (e.g. species, populations), which is discovered through molecular sequencing data and morphological data matrices. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 28. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us What is Phylogenetics? Phylogenetics is the study of evolutionary relatedness among groups of organisms (e.g. species, populations), which is discovered through molecular sequencing data and morphological data matrices. Evolution is regarded as a branching process, whereby populations are altered over time and may speciate into separate branches, hybridize together, or terminate by extinction. This may be visualized in a phylogenetic tree. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 29. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us What is Phylogenetics? Phylogenetics is the study of evolutionary relatedness among groups of organisms (e.g. species, populations), which is discovered through molecular sequencing data and morphological data matrices. Evolution is regarded as a branching process, whereby populations are altered over time and may speciate into separate branches, hybridize together, or terminate by extinction. This may be visualized in a phylogenetic tree. Ernst Haeckel’s recapitulation theory (”ontogeny recapitulates phylogeny”) is a hypothesis that in developing from embryo to adult, animals go through stages resembling or representing successive stages in the evolution of their remote ancestors. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 30. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Building Phylogenetic Trees Phylogenetic trees among a nontrivial number of input sequences are constructed using computational phylogenetics methods. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 31. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Building Phylogenetic Trees Phylogenetic trees among a nontrivial number of input sequences are constructed using computational phylogenetics methods. Common method is to search for maximum likelihood, often within a Bayesian Framework, and apply an explicit model of evolution to phylogenetic tree estimation. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 32. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Building Phylogenetic Trees Phylogenetic trees among a nontrivial number of input sequences are constructed using computational phylogenetics methods. Common method is to search for maximum likelihood, often within a Bayesian Framework, and apply an explicit model of evolution to phylogenetic tree estimation. Identifying the optimal tree using many of these techniques is NP-hard, so heuristic search and optimization methods are used in combination with tree-scoring functions to identify a reasonably good tree that fits the data. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 33. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Building Phylogenetic Trees Phylogenetic trees among a nontrivial number of input sequences are constructed using computational phylogenetics methods. Common method is to search for maximum likelihood, often within a Bayesian Framework, and apply an explicit model of evolution to phylogenetic tree estimation. Identifying the optimal tree using many of these techniques is NP-hard, so heuristic search and optimization methods are used in combination with tree-scoring functions to identify a reasonably good tree that fits the data. They do not necessarily accurately represent the species evolutionary history as the data on which they are based is noisy; the analysis can be confounded by horizontal gene transfer, hybridisation between species that were not nearest neighbors on the tree before hybridisation takes place, convergent evolution, and conserved sequences. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 34. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 35. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Other Models Transformational Grammars (Chomsky Hierarchy) RNA Structure Analysis Models (RNA contains the interactions - rather than preserving the sequence) EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 36. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Contact Us We are Hiring! EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics