SlideShare une entreprise Scribd logo
1  sur  37
Télécharger pour lire hors ligne
DNA nucleotide substitution models 1




Running head: DNA NUCLEOTIDE SUBSTITUTION MODELS




  On Some Measures of Genetic Distance Based on Rates of Nucleotide Substitution




                               Justine Leon A. Uro

                             Ph. D. Graduate Student

                Department of Biostatistics, University of Michigan

                                 Ann Arbor, MI
DNA nucleotide substitution models 2


                                       Abstract

We present a general DNA base-nucleotide substitution model and discuss three special

cases: three-substitution-type (3ST), two-substitution-type (2ST), and the Jukes-Cantor

models.
DNA nucleotide substitution models 3


            On Some Measures of Genetic Distance Based on Rates of

                                Nucleotide Substitution




                                       Introduction

      The genetic distance between two populations is defined as a concept related to the

time since the two populations diverged from a common ancestral population (Weir,

1990). A number of methods have been proposed to estimate the genetic distance between

two populations and they are either based on the allele frequencies in the two populations,

the rate of amino acid substitution in protein sequence data from the two populations, or

the rates of base nucleotide substitution in DNA sequence data from the two populations.

      Measures of genetic distance that utilize the allele frequencies are estimates based

on some geometric transformation of the allele frequencies (Cavalli-Sforza and Edwards,

1967; Cavalli-Sforza and Bodmer, 1971; Edwards, 1971; Nei, 1977, 1978; Li and Nei, 1977;

Smith, 1977). Some of these measures are purely geometric and do not involve any genetic

concept at all, e.g., the measure proposed by Cavalli-Sforza and Bodmer (Weir, 1990). On

the other hand, the ones proposed by Edwards (1971) and by Nei (1977) can be shown to

berelated to the concept of fixation index (Hartl and Clark, 1989).

      A measure of genetic distance based on amino acid substitution from protein

sequence data was proposed by Jukes and Cantor in 1969. This method was partly due to

the abundance of amino acid sequence data available then. Some geneticists argue that

this measure should be preferred since proteins are the subject of mutations.

      The discovery of DNA sequencing by Maxam and Gilbert and Sanger et al. in 1977

brought about more methods for measuring genetic distance. The estimates from these

methods are based on the rates of nucleotide substitution in DNA sequence data. These

are the methods which we will consider in this paper. We will formulate the general
DNA nucleotide substitution models 4


model, examine some special cases, give some numerical examples, and finally, examine

the validity of these models based on their assumptions.


                                   The General Model

      We now start by formulating the general model. Let S1 and S2 be two nucleotide

sequences with a common ancestral sequence. We consider a pair of homologous sites from

S1 and S2 and examine how much they have diverged from each other during their descent

from the ancestral sequence T years back (Figure 1).

      The evolutionary base substitution model we are going to use is shown in Figure 2.

We have used RNA codes for the nucleotides so that the pyrimidines are uracil (U) and

cytosine (C), and the purines are adenine (A) and guanine (G). The types and rates of

base substitution are summarized in Table 1. A substitution of a purine by a purine or a

pyrimidine by a pyrimidine is called a transition (TS). If a pyrimidine is substituted by a

purine or vice-versa then the substitution is called a transversion (TV). We distinguish

between two types of transversion, TV1 and TV2, and each type is shown in Table 1. The

classification of the TV as to type becomes easier if we look at Figure 2. The TV which go

either vertically up or down are TV1 and those which go diagonally are TV2.

      When comparing the homologous sites of S1 and S2 at any time t > 0, there are 16

possible nucleotide base pairings, 12 of which involve mismatched base pairs. If the

mismatch looks like a transition pair in Table 1, we call the mismatch a TS-type

mismatch. We have a TV1-type mismatch if the mismatch looks like a Type 1 tranversion

listed in Table 1. The TV2-type mismatch is defined in the same manner. We summarize

these in Table 2. In Table 2, for t > 0,
DNA nucleotide substitution models 5




                       4
           S(t) =           Si (t) = probability of no difference at a site                    (1)
                      i=1
                       4
           P (t) =          Pi (t) = probability of a TS-typedifference at a site              (2)
                      i=1
                       4
           Q(t) =           Qi (t) = probability of a TV1-type difference at a site            (3)
                      i=1
                       4
           P (t) =          Pi (t) = probability of a TTV2-type difference at a site           (4)
                      i=1

Hence,
                                    4
                 Q(t) + R(t) =           (Ri (t) + Qi (t))                                    (5)
                                   i=1
                               = probability of a TV-type difference at a site.


      We sometimes refer to the probabilities above as the match probabilities.

      We also define the following probabilities which we sometimes refer to as the base

probabilities.


                    U (t) = percentage frequency of uracil,                                   (6)

                    C(t) = percentage frequency of cytosine,                                  (7)

                    A(t) = percentgae frequency of adenine,                                   (8)

                    T (t) = percentage frequency of thymine in a strand                       (9)


so that


                               U (t) + C(t) + A(t) + G(t) = 1.                               (10)


Note that the probabilities in (1) - (4) and (6) - (9) are all time-dependent. We also have
DNA nucleotide substitution models 6


the following relations:


                              S(t) = U 2 (t) + C 2 (t) + A2 (t) + G2 (t)                   (11)

                              P (t) = 2U (t)C(t) + 2A(t)G(t)                               (12)

                              Q(t) = 2U (t)A(t) + 2C(t)G(t)                                (13)

                              R(t) = 2U (t)G(t) + 2C(t)A(t)                                (14)


Using the rates of substitution and the match probabilities, the mean rate of substitution

at a specific site over the time interval (0,T] is given by
                                       4                       T
                                            αi + βi + γi
                              k =                                  Bi (t) dt               (15)
                                                 T         0
                                      i=1

where B1 (t) = U (t), B2 (t) = C(t), B3 (t) = A(T ) and B4 (t) = G(t) and the integrals are

the average probabilities of finding a given base at a given site during the time interval

(0, T ].

           A measure of genetic distance is therefore given by


                                            K = 2T k                                       (16)


where k is as defined in (15), T is the time since the two sequences started diverging from

the ancestral sequence and the factor of 2 is due to the fact that we are considering two

branches that diverged.

           We now formulate the general model and proceed in a manner similar to that of

Takahata and Kimura (1981). At any time t ∈ [0, T ], consider a short time interval ∆t,

short enough so that if the mutation rate is small then higher order terms of ∆t and the

occurrence of a double substitution at a specific site may be neglected. We have


                 U (t + ∆t) = U (t) − α1 (∆t)U (t) + α2 (∆t)C(t) + β2 (∆t)A(t) +

                                 γ2 (∆t)U (t) − γ1 (∆t)U (t) − β1 (∆t)U (t)                (17)
DNA nucleotide substitution models 7


which we can rewrite as

        U (t + ∆t) − U (t)
                                               = − (α1 + β1 + γ1 ) U (t) + α2 C(t) + β2 A(t) + γ2 G(t).                           (18)
               ∆t

Getting the limit as ∆t approaches zero, (18) gives

                     dU (t)
                                    = − (α1 + β1 + γ1 ) U (t) + α2 C(t) + β2 A(t) + γ2 G(t).                                      (19)
                      dt

Doing this for the other three probabilities we get the following system of differential

equations:

                     dU (t)
                                        = −(α1 + β1 + γ1 )U (t) + α2 C(t) + β2 A(t) + γ2 G(t)                                     (20)
                      dt
                     dC(t)
                                        = α1 U (t) − (α2 + β3 + γ3 )C(t) + γ4 A(t) + β4 G(t)                                      (21)
                      dt
                     dA(t)
                                        = β1 U (t) − γ3 C(t) − (α3 + β2 + γ4 )A(t) + α4 G(t)                                      (22)
                      dt
                     dG(t)
                                        = γ1 U (t) + β3 C(t) + α3 A(t) − (α4 + β4 + γ2 )G(t).                                     (23)
                      dt
Writing (20) – (23) in matrix form gives
                                                                                                                        
              U (t)                     −(α1 + β1 + γ1 )         α2                 β2                 γ2              U (t)
                                                                                                                        
                                                                                                                        
         d C(t)                            α1           −(α2 + β3 + γ4 )         γ4                 β4            C(t)
                            =                                                                                                .    (24)
                                                                                                                        
                                                                                                                 
         dt A(t)
                   
                                    
                                             β1                 γ3           −(α3 + β2 + γ4 )         α4            A(t)
                                                                                                                           
                                                                                                                        
              G(t)                            γ1                 β3                 α3           −(α4 + β4   + γ2 )    G(t)


Using fact that the sum of the base probabilities is equal to 1, the matrix equation
reduces to

                                                                                                                    
              U (t)                   −(α1 + β1 + γ1 + γ2 )          α2 − γ2                  β2 − γ2        U (t)
           d                                                                                                      
              C(t)            =             α 1 − β4          −(α2 + β3 + γ4 + β4 )         γ4 − β4          C(t) .          (25)
           dt 
                                                                                                                  
                                                                                                                   
                A(t)                           β1 − α 4                 γ3 − α4          −(α3 + β2 + γ4 + α4 )    A(t)


which can be written as

                                                   d
                                                      B1 (t) = Q1 B1 (t) + C1 .                                                   (26)
                                                   dt

      Solving this system of differential equations entails solving for the eigenvalues of B1 .
Although it is easy to get the eigenvalues of the 3 × 3 matrix B1 , the matrix equation in
(26) is still difficult to solve since only the final conditions of the baseprobabilities can be
approximated and the initial conditions are unknown. One way to avoid this problem is to
DNA nucleotide substitution models 8


express the base probabilities in terms of the match probabilities. The matrix equation
involving the match probabilities is easier to solve since the initial conditions for the
match probabilities are Pi (0) = Qi (0) = Ri (0) = 0, i = 1, . . . , 4 and S(0) = 1. After the
expressions for the match probabilities have been solved, we can solve for the mean rate of
base substitution k and hence the estimate of genetic distance K.
      Inherent in these models of evolutionary base nucleotide substitutions are the
following four assumptions:
     (1) The two sequences diverged from a common ancestor, that is, Pi (0) = Qi (0) =
Ri (0) = 0, i = 1, . . . , 4 and S(0) = 1.
     (2) The two sequences are stochastically identical and independent, and within each
sequence, as substitution in one site in no way affects a substitution in some other site.
     (3) The homologous sites chosen from the two sequences are of the same fixed length
during their descent from the common ancestor.
    (4) (The fourth assumption reduces the number of parameters in the model by
assuming that some of the rates are equal. Since this differs among the three models that
we are going to consider, rather than stating it here, it will be stated as each model is
being considered.)

                                         The 3ST Model

      The first special case that we are going to consider is the three-substitution-type
(3ST) model. This model is due to Kimura (1981) and is the most general of the three
models we are going to consider in detail in this paper. The two other models we
considerlater are special cases of this model. The fourth assumption in the 3ST model is
that the TS-type substitutions all have rates α, and that the TV-type substitutions have
rates β and γ depending on the specific type as shown in Figure 3. Under the 3ST model,
Tables 1 and 2 can be simplified and their simplified forms are given below as Tables 3
and 4, respectively.
DNA nucleotide substitution models 9


      The system of differential equations in (20) – (23) simplifies to

                     dU (t)
                                   = −(α + β + γ)U (t) + αC(t) + βA(t) + γG(t)                                        (27)
                      dt
                     dC(t)
                                   = αU (t) − (α + β + γ)C(t) + γA(t) + βG(t)                                         (28)
                      dt
                     dA(t)
                                   = βU (t) = γC(t) − (α + β + γ)A(t) + αG(t)                                         (29)
                      dt
                     dG(t)
                                   = γU (t) + βC(t) + αA(t) − (α + β + γ)G(t).                                        (30)
                      dt
and its corresponding matrix form is
                                                                                                            
                   U (t)               −(α + β + γ)         α                   β               γ          U (t)
                                                                                                            
                                                                                                            
              d C(t)                     α         −(α + β + γ)              γ               β        C(t)
                             =                                                                                   ,    (31)
                                                                                                            
                                                                                                     
              dt A(t)
                 
                         
                                   
                                           β               γ          −(α + β + γ)             α        A(t)
                                                                                                               
                                                                                                            
                   G(t)                     γ               β                   α          −(α + β + γ)    G(t)


which again can be written in the form of (25). Considering the fact that the sum of the
base probabilities is 1, we can simplify (31) to

                                                                                                    
                        U (t)            −(α + β + 2γ)          α−γ                  β−γ        U (t)
                     d                                                                               
                        C(t)      =         α−β              −(α + 2β + γ)          γ−β         C(t) .           (32)
                     dt 
                                                                                                     
                                                                                                      
                          A(t)                 β−α                 γ−α              −(2α + β + γ)    A(t)




      We can also rewrite (32) in the form of (25). The matrix equation in (32) is not
difficult to solve since the eigenvalues are easily obtainable. The problem here is that we
do not know the initial conditions for the base probabilities since we do not know the base
frequencies of the ancestral sequence. As we have mentioned before, a way to avoid this
problem is to consider the match probabilities instead. It is easier to use the match
probabilities since we have the initial conditions for this set of probabilities given by the
first assumption (A1) of our model.
      Using the relationships between the base probabilities and the match probabilities
given in (11) – (14) it can be shown that

                                                                                                   
                 P (t)           −2(2α + β + 2γ)         −2(α − γ)               −2(α − β)  P (t) 2α
              d                                                                                    
                 Q(t)      =       −2(α − β)          −2(α + 2β + γ)          −2(β − α)  Q(t) + 2β  .          (33)
              dt 
                                                                                                   
                                                                                                    
                   R(t)               −2(γ − β)             −2(γ − α)           −(α + β + 2γ)    R(t)    2γ


which in matrix form is

                                               d
                                                  T(t) = Q2 T(t) + C2 .                                               (34)
                                               dt
DNA nucleotide substitution models 10


We now derive the expression for P (t) in (33). The expressions for Q(t) and R(t) can be
obtained in very much the same manner.
      Recall that in (11) – (14) we have

               P (t) = probability of a TS-type difference at a homologous site             (35)

                        = 2C(t)U (t) + 2A(t)G(t).                                          (36)

Using the product-rule for the derivative of a product,

             dP (t)                dU (t)         dC(t)          dG(t)        dA(t)
                       = 2 C(t)           + U (t)       + 2 A(t)       + G(t)       .      (37)
              dt                    dt             dt             dt           dt

If we substitute the expressions for the derivatives of the match probabilities we obtained
in (33) we have

    dP (t)
               = 2 {−2 (C(t)U (t) + A(t)G(t)) (α + β + γ) + 2β (A(t)C(t) + G(t)U (t)) +
     dt
                      2γ (A(t)U (t) + G(t)C(t)) + α A2 (t) + C 2 (t) + U 2 (t) + G2 (t)    (38)

      Using the fact that A2 (t) + C 2 (t) + U 2 (t) + G2 (t) = 1- P (t) - Q(t) -R(t) we can
simplify (38) to obtain

         dP (t)
                       = 2 − {−(2α + β + γ)P (t) + (β − α)R(t) + (γ − α)Q(t) + 2α}         (39)
          dt

which is what we want.
      We now solve the matrix equation in (34). Define the following Laplace transform:
                                                  
                                      P (t)  p(s)
                                                   
                          L[T(t)] = L Q(t) = q(s) = T (s).                           (40)
                                                   
                                                   
                                                   
                                         R(t)      r(s)

      Applying the Laplace transform to (34), we get

                                                            1
                                  sT (s) − T(0) = Q3 T (s) + C3                            (41)
                                                            s

which we can rewrite as

                                       1
                                      − C3 = (Q − sI3 )T (s),                              (42)
                                       s
DNA nucleotide substitution models 11


where we have used the fact that T(0)= 0 and I3 is the 3 × 3 identity matrix. The
problem of solving the system of differential equations in (34) is now reduced to solving a
system of algebraic equations in the three unknowns p(s), q(s), and r(s). We now solve for
these three unknowns and then apply the inverse Laplace transform to get the solutions
for P (t), Q(t), and R(t). Using Cramer’s rule, we get

                           −2α/s       −2(α − γ)             −2(α − β)

                           −2β/s −2(α + 2β + γ)              −2(β − α)

                           −2γ/s       −2(γ − α)         −2(α + β + 2γ) − s
                p(s) =                                                                (43)
                                                   ∆
                           −2(2α + β + γ) −2α/s              −2(α − β)

                              −2(β − γ)      −2β/s           −2(β − α)

                              −2(γ − β)      −2γ/s −2(α + β + 2γ) − s
                q(s) =                                                                (44)
                                                   ∆
                           −2(2α + β + γ) − s          −2(α − γ)     −2α/s

                                −2(β − γ)        −2(α + 2β + γ) −2β/s

                                −2(γ − β)               −2(γα)       −2γ/s
                r(s) =                                                                (45)
                                                   ∆

where,

                        −2(2α + β + γ)      −2(α − γ)            −2(α − β)
              ∆ =         −2(β − γ)       −2(α + 2β + γ)         −2(β − α)    .       (46)

                          −2(γ − β)         −2(γ − α)        −2(α + β + 2γ)

      Upon simplifying and expressing the results in partial fractions we get,
                                  1            1            1
                         1        4            4            4
               p(s) =      −            −            +                                (47)
                         4s s + 4(α + β) s + 4(α + γ) s + 4(β + γ)
                                  1            1            1
                         1        4            4            4
               q(s) =      −            +            −                                (48)
                         4s s + 4(α + β) s + 4(α + γ) s + 4(β + γ)
                                  1            1            1
                         1        4            4            4
               r(s) =      +            −            −             .                  (49)
                         4s s + 4(α + β) s + 4(α + γ) s + 4(β + γ)
DNA nucleotide substitution models 12


        Applying the inverse Laplace transform, we get the following as solutions to the
system in (49),

                                            1
                        P (t) = L−1 {p(s)} =  1 − eλ1 t − eλ2 t + eλ3 t                         (50)
                                            4
                                            1
                        Q(t) = L−1 {q(s)} =   1 − eλ1 t + eλ2 t − eλ3 t                         (51)
                                            4
                                            1
                        R(t) = L−1 {r(s)} =   1 + eλ1 t − eλ2 t − eλ3 t ,                       (52)
                                            4

where λ1 = −4(α+β), λ2 = −4(α+γ), λ3 = −4(β+γ).
        Under the 3ST model, the equation for k in (15) can be expressed as
                                 4                    T
                                      α+β+γ
                        k =                               Bi (t) dt   = α + β + γ,              (53)
                                        T         0
                                i=1

where we have used the fact that the sum of the base probabilities is equal to 1. Note that
the assumption on some of the rates being equal played a crucial role in being able to
factor α+β+γ out of the summation to get a simple expression for k. For K, we obtain

                                       K = 2T (α + β + γ).                                      (54)

        We can solve (52) for λ1 , λ2 , and λ3 to get

                              4(α + β)t = − ln(1 − 2P (t) − 2Q(t))                              (55)

                              4(α + γ)t = − ln(1 − 2P (t) − 2R(t))                              (56)

                              4(β + γ)t = − ln(1 − 2Q(t) − 2R(t)),                              (57)

and hence, for any time t ∈ [0, T ],

        8(α + β + γ)t   = − ln {[1 − 2P (t) − 2Q(t)][1 − 2P (T ) − 2R(t)][1 − 2Q(t) − 2R(t)]}   (58)

                    K   = 2kt                                                                   (59)
                           1
                        = − ln {[1 − 2P (t) − 2Q(t)][1 − 2P (T ) − 2R(t)][1 − 2Q(t) − 2R(t)]} . (60)
                           4



        The variance for this estimate of K is also given in the paper of Kimura (1981). We
have,

                2        1 2
               σK   =      a P (t) + b2 Q(t) + c2 R(t) − (aP (t) + bQ(t) + cR(t))2              (61)
                         n
DNA nucleotide substitution models 13


where,

                          1            1                  1
                   a =                          +                                       (62)
                          2   1 − 2P (t) − 2Q(t) 1 − 2P (t) − 2Q(t)
                          1            1                  1
                   b =                          +                                       (63)
                          2   1 − 2P (t) − 2Q(t) 1 − 2Q(t) − 2R(t)
                          1            1                  1
                   c =                          +                   .                   (64)
                          2   1 − 2P (t) − 2R(t) 1 − 2Q(t) − 2R(t)

                                     The 2ST Model

      We now proceed to a special case of this model which again is due to Kimura
(1980). We will call this model the two-substitution type model. The
two-substitution-type (2ST) was discussed by Kimura in a paper which was published a
year previous to the 3ST model. The 2ST model is a special case of the 3ST model and
hence we just give the results and do not gointo the details. (In the original paper, this
model is actually nameless. We just call it the 2ST model for convenience). The fourth
assumption here is that the transition rate is α and the transversion rate is β. Under this
assumption the diagram in Figure 3 simplifies further to the diagram in Figure 4.
      The tables for the base substitution and the match probabilities are given as Tables
5 and 6 below. The probability of a TS-type mismatch is given by P (t) and the
probability of a TV-type mismatch is given by QR(t) = Q(t)+ R(t). That is, we have
lumped together the TV1-type and TV2-type mismatches.
      The matrix equation in (24) under the 2ST model is
                                                              
              U (t)    −(α + 2β)     α         β         β    
                                                              
                C(t)         α     −(α + 2β)     β         β
                                                              
           d            
                      = 
                                                                 
                                                                                      (65)
           dt 
                                                              
              A(t)
                        
                             β         β     −(α + 2β)     α    
                                                                 
                                                              
                G(t)          β         β         α     −(α + 2β)

and the corresponding matrix equation involving the match probabilities is
                                                                     
                 P (t)  −2(2α + 2β) −2(α − β) −2(α − β) 
              d                                          
                 Q(t) =             −2(α + 3β) −2(β − α)  .                         (66)
                                                         
                                0
              dt                                         
                                                         
                   R(t)         0      −2(β − α) −2(α + 3β)
DNA nucleotide substitution models 14


If we now lump Q(t) and R(t) together as QR(t) we have the matrix equation in (67)
which only involves a 2 × 2 matrix instead of the previous 3 × 3 matrix.
                                                               
            P (t)        −2(2α + β + γ −2(α − β)  P (t)  2α
                    =                                        +                    (67)
             QR(t)                  0             8β         QR(t)       2β

To solve (67), we use the initial conditions: P (0) = QR(0) = 0. As solutions we have

                                             1 1 λ1 t 1 λ2 t
                                  P (t) =     − e + e                                     (68)
                                             4 2      4
                                             1 1 λ2 t
                                QR(t) =       − e                                         (69)
                                             2 2

where λ1 = −4(α+β) and λ2 = −8β.
        Under the 2ST model k = α + 2β. We can solve (69) for αt and βt and therefore
obtain our estimate K. We have

                      K = 2kt = 2(α + 2β)                                                 (70)
                            1
                        = − ln [1 − 2P (t) − QR(t)]2 [1 − 2QR(t)] .                       (71)
                            4

The variance of this estimate is given

                      2       1 2
                     σK   =     a P (t) + b2 QR(t) − (aP (t) + bQR(t))2                   (72)
                              n

where

                                         1
                       a =                                                                (73)
                               1 − 2P (t) − 2QR(t)
                               1             1            1
                       b =                          +                   .                 (74)
                               2 1 − 2P (t) − 2QR(t) 1 − 2QR(t)

                                 The Jukes-Cantor Model

        The simplest possible model is due to Jukes and Cantor (1969). The model was
primarily formulated to describe protein evolution by looking at the rate of amino acid
substitution. It turns out that this model can also be used to describe base substitution.
        The fourth assumption here is that all the rates of substitution are equal, i.e., α =
αi = βi = γi , i = 1, . . ., 4. Figure 2 then becomes Figure 5 below. Under the
Jukes-Cantor model, Tables 1 and 2 can be simplified to Tables 7 and 8, respectively.
DNA nucleotide substitution models 15


      The matrix equation in (24) under the Jukes-Cantor model is
                                                     
                     U (t)     −3α  α   α   α  U (t)
                                                     
                     C(t)      α   −3α  α   α  C(t)
                                                     
                  d         =                                                    (75)
                  dt                                 
                     A(t)
                           
                                 α
                                      α  −3α  α 
                                                   A(t)
                                                          
                                                     
                       G(t)        α   α   α  −3α    G(t)

and the matrix equation involving the match probabilities is
                                                                 
                     P (t)  −8β  0   0  P (t) 2α
                  d                            
                     Q(t) =  0       0  Q(t) + 2α                              (76)
                                                 
                                   −8β
                  dt                            
                                                
                       R(t)     0   0  −8β    R(t)    2α

If we define P QR(t) = P (t) + Q(t) + R(t) we have the differential equation

                              d
                                 P QR(t) = −8αP QR(t) + 6α                             (77)
                              dt

which has as a solution

                                              3
                                P QR(t) =       1 − e−8αt .                            (78)
                                              4

Under the Jukes-Cantor model, k = 3α and the estimate K is

                                          3      4
                         K = 2kt = 6αt = − ln(1 − P QR(t))                             (79)
                                          4      3

which can be obtained by solving for α in (78).
      The variance for K under the Jukes-Cantor model was derived by Kimura and Ohta
(1972) and is given by

             2           1   (1 − P QR(t))P QR(t)        (1 − P QR(t))P QR(t)
            σJC    =                                 =                        .        (80)
                         n      1 − 4P QR(t)/3             n(1 − 4P QR(t)/3)

      We are going to illustrate the three models by comparing the human and protein
kinase inhibitor. These two nucleotide sequences were recently sequenced by Olsen and
Uhler (1991). The sequences are more than a thousand base pairs long but only 231 of
these are part of the coding region. Our analysis is limited to these 231 base pairs. The
sequences are shown in Figure 6. Of the 231 bp, only 15 show mismatches. These are
DNA nucleotide substitution models 16


summarized in Table 9. Usually, the estimate K is computed by codon position since
there is that assumption that the substitution are independent of each other but there is
evidence that adjacent substitutions are actually not independent of each other. This will
not be done here since we have quite a small amount of base pairs and the mismatches are
quite far apart (except for the ones occurring at positions 200 and 201).
      The estimate under each model is shown in Table 10. It is seen here that the
estimates do not differ so much from one model to the other. The variances are also not
that different from each other.
      Estimates of genetic distance using some other nucleotide sequences are also
available. Tavar (1986) obtained estimates using human and mouse a-fetoprotein and
serum albumin nucleotide sequences. The results he got for the human-mouse
α-fetoprotein nucleotide sequences are reproduced below as Table 11. The data consist of
1824 base pairs and hence it was possible for him to compute the estimates by codon
positions.
      Note that the estimates tend to be bigger for the third codon position and smallest
for the second codon position. Tavar in his paper showed that the estimates are not
homogeneous if we consider the codon positions as strata. Unfortunately, we cannot do
the same thing in our analysis here since we just have 231 bp and 15 mismatches.
      All three models of evolutionary base substitutions that we have discussed here are
far from perfect and their weaknesses lie on the second and third assumptions made to
formulate the models.
      The second assumption states that the nucleotide sequences are stochastically
identical and independent of each other. It is most possibly true that nucleotide sequences
evolve in a manner stochastically independent from each other but there are evidences
that they are in fact not stochastically identical. For example, Wu and Li (1985) noticed
that the substitution rates in rodent is much higher than that in humans. Even within a
sequence, there is evidence that that rates are much higher in some spots (“hot spots”)
than in others (Miyata and Yasunaga, 1981; Brown and Clegg, 1983) and that the rates
differ between the sense and antisense strand (Wu and Maeda, 1987). There are also
evidences showing that a substitution in one site does a affect the rate of substitution in
an adjacent site in phage T4 (Koch, 1971). It would be interesting to know if the same
DNA nucleotide substitution models 17


holds for higher organisms. This last fact is also one of the reasons why substitution rates
are computed by codon sites if the data allow.
      The third assumption assumes that the diverging nucleotide sequences are both of a
fixed length and hence it doesn’t take into account mutations resulting from deletions and
insertions. These assumption also does not take into account the possibility of concerted
evolution, which brings about the presence of multigene families, and the duplication and
divergence in multigene families.
      There have been efforts to consider models which incorporate these shortcomings
but at the same time still make the models mathematically tractable. Needleman and
Wunsch (1970), for example, proposed a model which assigns weights to substitutions,
insertions and deletions. Unfortunately, the weights assigned were arbitrary and had no
genetic basis.
      The main problem that these models of evolutionary base nucleotide substitution
face is that when all of the mechanisms of evolution are included in the model, the model
becomes mathematically intractable with the present computer technology. Considering
the fact that computer technology is still advancing, it is hoped that a model incorporating
most, if not all, of the mechanisms discussed can be formulated in the near future.
DNA nucleotide substitution models 18


                                         References

Brown, A., & Clegg, M. (1983). Analysis of variation in related DNA sequences. In

     B. Weir (Ed.), Statistical data analysis (pp. 107–132). New York: Marcel-Dekker.

Cavalli-Sforza, L., & Bodmer, W. (1971). The genetics of human populations. San

     Francisco: W. H. Freeman.

Cavalli-Sforza, L., & Edwards, A. (1967). Phylogenetic analysis: models and estimation

     procedures. American Journal of Human Genetics, 19 , 233–257.

Edwards, A. (1971). The distance between populations on the basis of gene frequencies.

     Biometrics, 27 , 873–881.

Jukes, T., & Cantor, C. (1969). Evolution of protein molecules. In H. N. Munro (Ed.),

     Mammalian protein metabolism (pp. 21–123). New York: Academic Press.

Kimura, M. (1980). A simple method for estimating evolutionary rates of base

     substitutions through comparative studies of nucleotide sequences. Journal of

     Molecular Evolution, 16 , 11–120.

Kimura, M. (1981). Estimation of evolutionary distances between homologous nucleotide

     sequences. Proceedings of the National Academy of Sciences USA, 78 , 454–458.

Kimura, M., & Ohta, T. (1972). On the stochastic model for estimation of mutational

     distance between homologous proteins. Journal of Molecular Evolution, 2 , 87–90.

Koch, R. (1971). The influence of neighbouring base pairs upon base-pair substitution

     mutation rates. Proceedings of the National Academy of Sciences USA, 68 , 773–776.

Maxam, A., & Gilbert, W. (1977). A new method for sequencing DNA. Proceedings of the

     National Academy of Sciences USA, 74 , 560–564.

Miura, R. (Ed.). (1986). Lectures on mathematics in the life sciences. Rhode Island:

     American Mathematical Society.

Miyata, T., & Yasunaga, T. (1981). Rapidly evolving mouse α-globin-related

     pseudogenes. Proceedings of the National Academy of Sciences USA, 78 , 450–453.
DNA nucleotide substitution models 19


Munro, H. N. (Ed.). (1969). Mammalian protein metabolism. New York: Academic Press.

Needleman, S., & Wunsch, C. (1970). A general method applicable to the search for

     similarities in the amino acid sequence of two proteins. Journal of Molecular

     Biology, 48 , 443–453.

Nei, M. (1977). F-statisitcs and analysis of gene diversity in subdivided populations.

     Annals of Human Genetics, 41 , 225–233.

Olsen, S., & Uhler, M. (1991a). (nucleotide sequence of the human protein kinase

     inhibitor). Molecular Endocrinology. (manuscript submitted)

Olsen, S., & Uhler, M. (1991b). (nucleotide sequence of the mouse protein kinase

     inhibitor). Journal of Biological Chemistry. (in press)

Sanger, F., Nicklen, S., & Coulson, A. (1977). DNA sequencing with chain-terminating

     inhibitors. Proceedings of the National Academy of Sciences USA, 74 , 4563–4567.

Takahata, N., & Kimura, M. (1981). A model of evolutionary base substitutions and its

     application with special reference to rapid change in pseudogenes. Genetics, 98 ,

     641–657.

Tavar´, S. (1986). Some probabilistic and statistical problems in the analysis of DNA
     e

     sequences. In R. Miura (Ed.), Lectures on mathematics in the life sciences (pp.

     57–86). Rhode Island: American Mathematical Society.

Weir, B. (Ed.). (1983). Statistical data analysis. New York: Marcel-Dekker.

Weir, B. (1990). Genetic data analysis: methods for discrete population data. Sunderland,

     Massachussetts: Sinauer Associates.

Wu, C., & Li, W. (1985). Evidence for higher rates of nucleotide substitution in rodents

     than in man. Proceedings of the National Academy of Sciences USA, 82 , 1741–1745.

Wu, C., & Maeda, N. (1987). Inequality in mutation rates of the two strands of DNA.

     Nature, 327 , 169–170.
DNA nucleotide substitution models 20


Table 1

Types and rates of nucleotide sustitution.



                                             Types

                    Transition (TS)          Transversion (TV1)        Transversion (TV2)

  Initial base      U    C    A    G         U    A      C    G        U    G    C    A

  New Base          C    U    G    A         A    U      G    C        G    U    A    C

  Rates             α1   α2   α3   α4        β1   β2    β3    β4       γ1   γ2   γ3   γ4
DNA nucleotide substitution models 21


Table 2

Possible nucleotide base pairings at a specific homologius site for t > 0.



                                                Types



 Sequence              Same               TS-type                TV1-type             TV2-type



      1          U    C    A    G    U    C    A    G        U   A     C    G    U    G     C    A


      2          U    C    A    G    C    U    G    A        A   U     G    C    G    U     A    C



 Probabilities   S1   S2   S3   S4   P1   P2   P3   P4      Q1   Q2   Q3    Q4   R1   R2   R3    R4
DNA nucleotide substitution models 22


Table 3

Types and rates of nucleotide sustitution under the 3ST model.



                                         Types

                    Transition (TS)      Transversion (TV1)      Transversion (TV2)

   Initial base     U   C   A    G       U   A   C      G        U   G   C     A

   New Base         C   U   G    A       A   U   G      C        G   U   A     C

   Rates            α   α   α    α       β   β   β      β        γ   γ   γ     γ
DNA nucleotide substitution models 23


Table 4

Possible nucleotide base pairings at a specific homologius site for t > 0 under the 3ST model.



                                                  Types



      Sequence            Same                TS-type                 TV1-type            TV2-type



           1          U   C       A   G   U   C       A   G       U   A       C   G   U    G       C   A


           2          U   C       A   G   C   U       G   A       A   U       G   C   G    U       A   C



      Probabilities           S                   P                       Q                    R
DNA nucleotide substitution models 24


Table 5

Types and rates of nucleotide sustitution under the 2ST model.



                                         Types

                    Transition (TS)      Transversion (TV1)      Transversion (TV2)

   Initial base     U   C   A    G       U   A   C      G        U   G   C     A

   New Base         C   U   G    A       A   U   G      C        G   U   A     C

   Rates            α   α   α    α       β   β   β      β        β   β   β     β
DNA nucleotide substitution models 25


Table 6

Possible nucleotide base pairings at a specific homologius site for t > 0 under the 2ST model.



                                                  Types



      Sequence            Same                TS-type                 TV1-type            TV2-type



           1          U   C       A   G   U   C       A   G       U   A    C     G    U    G   C     A


           2          U   C       A   G   C   U       G   A       A   U    G     C    G    U   A     C



      Probabilities           S                   P                              QR
DNA nucleotide substitution models 26


Table 7

Types and rates of nucleotide sustitution under the Jukes-Cantor model.



                                         Types

                    Transition (TS)      Transversion (TV1)       Transversion (TV2)

   Initial base     U   C   A    G       U   A   C      G         U   G   C     A

   New Base         C   U   G    A       A   U   G      C         G   U   A     C

   Rates            α   α   α    α       α   α   α      α         α   α   α     α
DNA nucleotide substitution models 27


Table 8

Possible nucleotide base pairings at a specific homologius site for t > 0 under the Jukes-

Cantor model.


                                                  Types



      Sequence            Same                TS-type                 TV1-type           TV2-type



           1          U   C       A   G   U   C    A      G       U   A    C     G   U    G   C     A


           2          U   C       A   G   C   U    G      A       A   U    G     C   G    U   A     C



      Probabilities           S                                   P QR
DNA nucleotide substitution models 28


Table 9

Nucleotide mismatches observed after time T since divergence between human and mouse

protein kinase inhibitor (pki).



                                              Types

                          Transition (TS)       Transversion (TV1)        Transversion (TV2)

 Human pki                U       C   A   G     U     A    C    G         U   G   C     A

 Mouse pki                C       U   G   A     A     U    G     C        G   U   A     C

 Numbers observed         5       0   3   2     0      1   1     6        0   1   1     2
DNA nucleotide substitution models 29


Table 10

Estimates of the genetic distance K under the different models being considered.



                       Model              K             standard error

                   Jukes-Cantor        0.0682288          0.0178312

                        2ST            0.0686475          0.0180611

                        3ST            0.0686535          0.0180644
DNA nucleotide substitution models 30


Table 11

Estimates of the genetic distance Ki , where i = 1, 2, or 3, is the ith codon position, under

the different models considered in Tavar´ (1986). The sequence data are that of human and
                                       e

mouse α-fetoprotein.



            Model                 K1                   K2                   K3

        Jukes-Cantor        0.1752 (.0186)       0.1387 (.0162)        .6566 (.0483)

             3ST            0.1760 (.0188)       0.1389 (.0163)        .7230 (.0642)

        (The parenthesized quantities are standard errors.)
DNA nucleotide substitution models 31


                                    Figure Captions

Figure 1. Divergence of sequences S1 and S2 from some common ancestor.


Figure 2. Types and rates of nucleotide substitutions.


Figure 3. Types and rates of nucleotide substitutions: 3ST Model.


Figure 4. Types and rates of nucleotide substitutions: 2ST Model.


Figure 5. Types and rates of nucleotide substitutions: Jukes-Cantor Model.


Figure 6. The nucleotide sequences of the coding region of the mouse protein kinase

inhibitor (Mpki.M) and the human protein kinase inhibitor (Hpki.2) are shown above.

The 15 mismatches are indicated with bars (Olsen and Uhler, 1991a, 1991b).
Ancestral sequence
                        ¢f
                       ¢ f
                   ¢         f
               ¢                 f
           ¢                         f
  T¢                                     fT
       ¢                                  f
   ¢                                        f
  
  ¢                                          x
                                             f
 S1                                        S2
α1
      '
  U                E C
       s
       d      α2   
   T     γ
      d d1      γ3   T
                  
        d d    
β1   β2 d  d  β3       β4
             d
              
            d  
             d
             d
         γ  d d γ2
          4
                 d d
     c        α3 d     c
      '©
                   ‚
  A              E   G
            α4
α
      '
  U              E C
      s
      d     α   
                 
  T
     d dγ     γ   T
                
       d d    
β   β   d d   β      β
            
             
          d d
            d
           d
        γ  d dγ
               d d
    c       α        c
     '©
                ‚
                d
  A            E   G
           α
α
      '
  U             E C
      s
      d     α   
                 
  T
     d dβ     β   T
               
       d d    
β   β   d d   β     β
            
             
          d d
            d
           d
        β  d dβ
              d d
    c       α       c
     '©
                ‚
                d
  A           E   G
          α
α
      '
  U             E C
      s
      d     α   
                 
  T
     d dα     α   T
               
       d d    
α   α d  d  α       α
           d
            
          d  
            d
           d
        α  d dα
              d d
    c       α       c
     '©
                ‚
                d
  A           E   G
          α
DNA Nucleotide Substitution Models Explained

Contenu connexe

Tendances

Fourier Transform
Fourier TransformFourier Transform
Fourier TransformAamir Saeed
 
Linear integral equations
Linear integral equationsLinear integral equations
Linear integral equationsSpringer
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Scienceresearchinventy
 
Unit 2 analysis of continuous time signals-mcq questions
Unit 2   analysis of continuous time signals-mcq questionsUnit 2   analysis of continuous time signals-mcq questions
Unit 2 analysis of continuous time signals-mcq questionsDr.SHANTHI K.G
 
Introduction to Fourier transform and signal analysis
Introduction to Fourier transform and signal analysisIntroduction to Fourier transform and signal analysis
Introduction to Fourier transform and signal analysis宗翰 謝
 
Boundedness of the Twisted Paraproduct
Boundedness of the Twisted ParaproductBoundedness of the Twisted Paraproduct
Boundedness of the Twisted ParaproductVjekoslavKovac1
 
Doering Savov
Doering SavovDoering Savov
Doering Savovgh
 
Thegeneralizedinverse weibulldistribution ....
Thegeneralizedinverse weibulldistribution ....Thegeneralizedinverse weibulldistribution ....
Thegeneralizedinverse weibulldistribution ....fitriya rizki
 
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsRao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsChristian Robert
 
R. Jimenez - Fundamental Physics from Astronomical Observations
R. Jimenez - Fundamental Physics from Astronomical ObservationsR. Jimenez - Fundamental Physics from Astronomical Observations
R. Jimenez - Fundamental Physics from Astronomical ObservationsSEENET-MTP
 
Variants of the Christ-Kiselev lemma and an application to the maximal Fourie...
Variants of the Christ-Kiselev lemma and an application to the maximal Fourie...Variants of the Christ-Kiselev lemma and an application to the maximal Fourie...
Variants of the Christ-Kiselev lemma and an application to the maximal Fourie...VjekoslavKovac1
 
On Twisted Paraproducts and some other Multilinear Singular Integrals
On Twisted Paraproducts and some other Multilinear Singular IntegralsOn Twisted Paraproducts and some other Multilinear Singular Integrals
On Twisted Paraproducts and some other Multilinear Singular IntegralsVjekoslavKovac1
 
5. fourier properties
5. fourier properties5. fourier properties
5. fourier propertiesskysunilyadav
 
Estimates for a class of non-standard bilinear multipliers
Estimates for a class of non-standard bilinear multipliersEstimates for a class of non-standard bilinear multipliers
Estimates for a class of non-standard bilinear multipliersVjekoslavKovac1
 

Tendances (20)

Fourier Transform
Fourier TransformFourier Transform
Fourier Transform
 
ENFPC 2010
ENFPC 2010ENFPC 2010
ENFPC 2010
 
Properties of Fourier transform
Properties of Fourier transformProperties of Fourier transform
Properties of Fourier transform
 
Linear integral equations
Linear integral equationsLinear integral equations
Linear integral equations
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
Unit 2 analysis of continuous time signals-mcq questions
Unit 2   analysis of continuous time signals-mcq questionsUnit 2   analysis of continuous time signals-mcq questions
Unit 2 analysis of continuous time signals-mcq questions
 
Introduction to Fourier transform and signal analysis
Introduction to Fourier transform and signal analysisIntroduction to Fourier transform and signal analysis
Introduction to Fourier transform and signal analysis
 
Tut09
Tut09Tut09
Tut09
 
Ps02 cmth03 unit 1
Ps02 cmth03 unit 1Ps02 cmth03 unit 1
Ps02 cmth03 unit 1
 
Boundedness of the Twisted Paraproduct
Boundedness of the Twisted ParaproductBoundedness of the Twisted Paraproduct
Boundedness of the Twisted Paraproduct
 
Basic concepts and how to measure price volatility
Basic concepts and how to measure price volatility Basic concepts and how to measure price volatility
Basic concepts and how to measure price volatility
 
Doering Savov
Doering SavovDoering Savov
Doering Savov
 
Thegeneralizedinverse weibulldistribution ....
Thegeneralizedinverse weibulldistribution ....Thegeneralizedinverse weibulldistribution ....
Thegeneralizedinverse weibulldistribution ....
 
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsRao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
 
Ch06 2
Ch06 2Ch06 2
Ch06 2
 
R. Jimenez - Fundamental Physics from Astronomical Observations
R. Jimenez - Fundamental Physics from Astronomical ObservationsR. Jimenez - Fundamental Physics from Astronomical Observations
R. Jimenez - Fundamental Physics from Astronomical Observations
 
Variants of the Christ-Kiselev lemma and an application to the maximal Fourie...
Variants of the Christ-Kiselev lemma and an application to the maximal Fourie...Variants of the Christ-Kiselev lemma and an application to the maximal Fourie...
Variants of the Christ-Kiselev lemma and an application to the maximal Fourie...
 
On Twisted Paraproducts and some other Multilinear Singular Integrals
On Twisted Paraproducts and some other Multilinear Singular IntegralsOn Twisted Paraproducts and some other Multilinear Singular Integrals
On Twisted Paraproducts and some other Multilinear Singular Integrals
 
5. fourier properties
5. fourier properties5. fourier properties
5. fourier properties
 
Estimates for a class of non-standard bilinear multipliers
Estimates for a class of non-standard bilinear multipliersEstimates for a class of non-standard bilinear multipliers
Estimates for a class of non-standard bilinear multipliers
 

Similaire à DNA Nucleotide Substitution Models Explained

Applications of differential equations by shahzad
Applications of differential equations by shahzadApplications of differential equations by shahzad
Applications of differential equations by shahzadbiotech energy pvt limited
 
Tele4653 l5
Tele4653 l5Tele4653 l5
Tele4653 l5Vin Voro
 
Signal and Systems part i
Signal and Systems part iSignal and Systems part i
Signal and Systems part iPatrickMumba7
 
8fbf4451c622e6efbcf7452222d21ea5_MITRES_6_007S11_hw02.pdf
8fbf4451c622e6efbcf7452222d21ea5_MITRES_6_007S11_hw02.pdf8fbf4451c622e6efbcf7452222d21ea5_MITRES_6_007S11_hw02.pdf
8fbf4451c622e6efbcf7452222d21ea5_MITRES_6_007S11_hw02.pdfTsegaTeklewold1
 
Lesson 7: Vector-valued functions
Lesson 7: Vector-valued functionsLesson 7: Vector-valued functions
Lesson 7: Vector-valued functionsMatthew Leingang
 
Tele3113 wk1tue
Tele3113 wk1tueTele3113 wk1tue
Tele3113 wk1tueVin Voro
 
3. Frequency-Domain Analysis of Continuous-Time Signals and Systems.pdf
3. Frequency-Domain Analysis of Continuous-Time Signals and Systems.pdf3. Frequency-Domain Analysis of Continuous-Time Signals and Systems.pdf
3. Frequency-Domain Analysis of Continuous-Time Signals and Systems.pdfTsegaTeklewold1
 
Pricing Exotics using Change of Numeraire
Pricing Exotics using Change of NumerairePricing Exotics using Change of Numeraire
Pricing Exotics using Change of NumeraireSwati Mital
 
Tele4653 l1
Tele4653 l1Tele4653 l1
Tele4653 l1Vin Voro
 
Métodos computacionales para el estudio de modelos epidemiológicos con incer...
Métodos computacionales para el estudio de modelos  epidemiológicos con incer...Métodos computacionales para el estudio de modelos  epidemiológicos con incer...
Métodos computacionales para el estudio de modelos epidemiológicos con incer...Facultad de Informática UCM
 
Hydrodynamic scaling and analytically solvable models
Hydrodynamic scaling and analytically solvable modelsHydrodynamic scaling and analytically solvable models
Hydrodynamic scaling and analytically solvable modelsGiorgio Torrieri
 
Redundancy in robot manipulators and multi robot systems
Redundancy in robot manipulators and multi robot systemsRedundancy in robot manipulators and multi robot systems
Redundancy in robot manipulators and multi robot systemsSpringer
 
Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...Alexander Litvinenko
 
Pinning and facetting in multiphase LBMs
Pinning and facetting in multiphase LBMsPinning and facetting in multiphase LBMs
Pinning and facetting in multiphase LBMsTim Reis
 
Testing for Extreme Volatility Transmission
Testing for Extreme Volatility Transmission Testing for Extreme Volatility Transmission
Testing for Extreme Volatility Transmission Arthur Charpentier
 
Analysis of coupled inset dielectric guide structure
Analysis of coupled inset dielectric guide structureAnalysis of coupled inset dielectric guide structure
Analysis of coupled inset dielectric guide structureYong Heui Cho
 

Similaire à DNA Nucleotide Substitution Models Explained (20)

Applications of differential equations by shahzad
Applications of differential equations by shahzadApplications of differential equations by shahzad
Applications of differential equations by shahzad
 
Tele4653 l5
Tele4653 l5Tele4653 l5
Tele4653 l5
 
Signal and Systems part i
Signal and Systems part iSignal and Systems part i
Signal and Systems part i
 
8fbf4451c622e6efbcf7452222d21ea5_MITRES_6_007S11_hw02.pdf
8fbf4451c622e6efbcf7452222d21ea5_MITRES_6_007S11_hw02.pdf8fbf4451c622e6efbcf7452222d21ea5_MITRES_6_007S11_hw02.pdf
8fbf4451c622e6efbcf7452222d21ea5_MITRES_6_007S11_hw02.pdf
 
Lesson 7: Vector-valued functions
Lesson 7: Vector-valued functionsLesson 7: Vector-valued functions
Lesson 7: Vector-valued functions
 
Tele3113 wk1tue
Tele3113 wk1tueTele3113 wk1tue
Tele3113 wk1tue
 
3. Frequency-Domain Analysis of Continuous-Time Signals and Systems.pdf
3. Frequency-Domain Analysis of Continuous-Time Signals and Systems.pdf3. Frequency-Domain Analysis of Continuous-Time Signals and Systems.pdf
3. Frequency-Domain Analysis of Continuous-Time Signals and Systems.pdf
 
Pricing Exotics using Change of Numeraire
Pricing Exotics using Change of NumerairePricing Exotics using Change of Numeraire
Pricing Exotics using Change of Numeraire
 
Discretization
DiscretizationDiscretization
Discretization
 
Tele4653 l1
Tele4653 l1Tele4653 l1
Tele4653 l1
 
signal.pdf
signal.pdfsignal.pdf
signal.pdf
 
Métodos computacionales para el estudio de modelos epidemiológicos con incer...
Métodos computacionales para el estudio de modelos  epidemiológicos con incer...Métodos computacionales para el estudio de modelos  epidemiológicos con incer...
Métodos computacionales para el estudio de modelos epidemiológicos con incer...
 
Hydrodynamic scaling and analytically solvable models
Hydrodynamic scaling and analytically solvable modelsHydrodynamic scaling and analytically solvable models
Hydrodynamic scaling and analytically solvable models
 
D021018022
D021018022D021018022
D021018022
 
Redundancy in robot manipulators and multi robot systems
Redundancy in robot manipulators and multi robot systemsRedundancy in robot manipulators and multi robot systems
Redundancy in robot manipulators and multi robot systems
 
Signals Processing Homework Help
Signals Processing Homework HelpSignals Processing Homework Help
Signals Processing Homework Help
 
Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...
 
Pinning and facetting in multiphase LBMs
Pinning and facetting in multiphase LBMsPinning and facetting in multiphase LBMs
Pinning and facetting in multiphase LBMs
 
Testing for Extreme Volatility Transmission
Testing for Extreme Volatility Transmission Testing for Extreme Volatility Transmission
Testing for Extreme Volatility Transmission
 
Analysis of coupled inset dielectric guide structure
Analysis of coupled inset dielectric guide structureAnalysis of coupled inset dielectric guide structure
Analysis of coupled inset dielectric guide structure
 

Plus de Justine Leon Uro

"What-teach-kids" in the Paleolithic Age
"What-teach-kids" in the Paleolithic Age"What-teach-kids" in the Paleolithic Age
"What-teach-kids" in the Paleolithic AgeJustine Leon Uro
 
Regression-through-the-origin: Ratio of Means or Medians
Regression-through-the-origin: Ratio of Means or MediansRegression-through-the-origin: Ratio of Means or Medians
Regression-through-the-origin: Ratio of Means or MediansJustine Leon Uro
 
An Algorithm for Evaluating the Validity of Singly-Quantified Monadic Predica...
An Algorithm for Evaluating the Validity of Singly-Quantified Monadic Predica...An Algorithm for Evaluating the Validity of Singly-Quantified Monadic Predica...
An Algorithm for Evaluating the Validity of Singly-Quantified Monadic Predica...Justine Leon Uro
 
algorithm, validity, predicate logic (pdf format)
algorithm, validity, predicate logic (pdf format)algorithm, validity, predicate logic (pdf format)
algorithm, validity, predicate logic (pdf format)Justine Leon Uro
 
Examples to accompany "algorithm, validity, predicate logic"
Examples to accompany "algorithm, validity, predicate logic"Examples to accompany "algorithm, validity, predicate logic"
Examples to accompany "algorithm, validity, predicate logic"Justine Leon Uro
 
algorithm, validity, predicate logic
algorithm, validity, predicate logicalgorithm, validity, predicate logic
algorithm, validity, predicate logicJustine Leon Uro
 

Plus de Justine Leon Uro (6)

"What-teach-kids" in the Paleolithic Age
"What-teach-kids" in the Paleolithic Age"What-teach-kids" in the Paleolithic Age
"What-teach-kids" in the Paleolithic Age
 
Regression-through-the-origin: Ratio of Means or Medians
Regression-through-the-origin: Ratio of Means or MediansRegression-through-the-origin: Ratio of Means or Medians
Regression-through-the-origin: Ratio of Means or Medians
 
An Algorithm for Evaluating the Validity of Singly-Quantified Monadic Predica...
An Algorithm for Evaluating the Validity of Singly-Quantified Monadic Predica...An Algorithm for Evaluating the Validity of Singly-Quantified Monadic Predica...
An Algorithm for Evaluating the Validity of Singly-Quantified Monadic Predica...
 
algorithm, validity, predicate logic (pdf format)
algorithm, validity, predicate logic (pdf format)algorithm, validity, predicate logic (pdf format)
algorithm, validity, predicate logic (pdf format)
 
Examples to accompany "algorithm, validity, predicate logic"
Examples to accompany "algorithm, validity, predicate logic"Examples to accompany "algorithm, validity, predicate logic"
Examples to accompany "algorithm, validity, predicate logic"
 
algorithm, validity, predicate logic
algorithm, validity, predicate logicalgorithm, validity, predicate logic
algorithm, validity, predicate logic
 

Dernier

Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 

Dernier (20)

Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 

DNA Nucleotide Substitution Models Explained

  • 1. DNA nucleotide substitution models 1 Running head: DNA NUCLEOTIDE SUBSTITUTION MODELS On Some Measures of Genetic Distance Based on Rates of Nucleotide Substitution Justine Leon A. Uro Ph. D. Graduate Student Department of Biostatistics, University of Michigan Ann Arbor, MI
  • 2. DNA nucleotide substitution models 2 Abstract We present a general DNA base-nucleotide substitution model and discuss three special cases: three-substitution-type (3ST), two-substitution-type (2ST), and the Jukes-Cantor models.
  • 3. DNA nucleotide substitution models 3 On Some Measures of Genetic Distance Based on Rates of Nucleotide Substitution Introduction The genetic distance between two populations is defined as a concept related to the time since the two populations diverged from a common ancestral population (Weir, 1990). A number of methods have been proposed to estimate the genetic distance between two populations and they are either based on the allele frequencies in the two populations, the rate of amino acid substitution in protein sequence data from the two populations, or the rates of base nucleotide substitution in DNA sequence data from the two populations. Measures of genetic distance that utilize the allele frequencies are estimates based on some geometric transformation of the allele frequencies (Cavalli-Sforza and Edwards, 1967; Cavalli-Sforza and Bodmer, 1971; Edwards, 1971; Nei, 1977, 1978; Li and Nei, 1977; Smith, 1977). Some of these measures are purely geometric and do not involve any genetic concept at all, e.g., the measure proposed by Cavalli-Sforza and Bodmer (Weir, 1990). On the other hand, the ones proposed by Edwards (1971) and by Nei (1977) can be shown to berelated to the concept of fixation index (Hartl and Clark, 1989). A measure of genetic distance based on amino acid substitution from protein sequence data was proposed by Jukes and Cantor in 1969. This method was partly due to the abundance of amino acid sequence data available then. Some geneticists argue that this measure should be preferred since proteins are the subject of mutations. The discovery of DNA sequencing by Maxam and Gilbert and Sanger et al. in 1977 brought about more methods for measuring genetic distance. The estimates from these methods are based on the rates of nucleotide substitution in DNA sequence data. These are the methods which we will consider in this paper. We will formulate the general
  • 4. DNA nucleotide substitution models 4 model, examine some special cases, give some numerical examples, and finally, examine the validity of these models based on their assumptions. The General Model We now start by formulating the general model. Let S1 and S2 be two nucleotide sequences with a common ancestral sequence. We consider a pair of homologous sites from S1 and S2 and examine how much they have diverged from each other during their descent from the ancestral sequence T years back (Figure 1). The evolutionary base substitution model we are going to use is shown in Figure 2. We have used RNA codes for the nucleotides so that the pyrimidines are uracil (U) and cytosine (C), and the purines are adenine (A) and guanine (G). The types and rates of base substitution are summarized in Table 1. A substitution of a purine by a purine or a pyrimidine by a pyrimidine is called a transition (TS). If a pyrimidine is substituted by a purine or vice-versa then the substitution is called a transversion (TV). We distinguish between two types of transversion, TV1 and TV2, and each type is shown in Table 1. The classification of the TV as to type becomes easier if we look at Figure 2. The TV which go either vertically up or down are TV1 and those which go diagonally are TV2. When comparing the homologous sites of S1 and S2 at any time t > 0, there are 16 possible nucleotide base pairings, 12 of which involve mismatched base pairs. If the mismatch looks like a transition pair in Table 1, we call the mismatch a TS-type mismatch. We have a TV1-type mismatch if the mismatch looks like a Type 1 tranversion listed in Table 1. The TV2-type mismatch is defined in the same manner. We summarize these in Table 2. In Table 2, for t > 0,
  • 5. DNA nucleotide substitution models 5 4 S(t) = Si (t) = probability of no difference at a site (1) i=1 4 P (t) = Pi (t) = probability of a TS-typedifference at a site (2) i=1 4 Q(t) = Qi (t) = probability of a TV1-type difference at a site (3) i=1 4 P (t) = Pi (t) = probability of a TTV2-type difference at a site (4) i=1 Hence, 4 Q(t) + R(t) = (Ri (t) + Qi (t)) (5) i=1 = probability of a TV-type difference at a site. We sometimes refer to the probabilities above as the match probabilities. We also define the following probabilities which we sometimes refer to as the base probabilities. U (t) = percentage frequency of uracil, (6) C(t) = percentage frequency of cytosine, (7) A(t) = percentgae frequency of adenine, (8) T (t) = percentage frequency of thymine in a strand (9) so that U (t) + C(t) + A(t) + G(t) = 1. (10) Note that the probabilities in (1) - (4) and (6) - (9) are all time-dependent. We also have
  • 6. DNA nucleotide substitution models 6 the following relations: S(t) = U 2 (t) + C 2 (t) + A2 (t) + G2 (t) (11) P (t) = 2U (t)C(t) + 2A(t)G(t) (12) Q(t) = 2U (t)A(t) + 2C(t)G(t) (13) R(t) = 2U (t)G(t) + 2C(t)A(t) (14) Using the rates of substitution and the match probabilities, the mean rate of substitution at a specific site over the time interval (0,T] is given by 4 T αi + βi + γi k = Bi (t) dt (15) T 0 i=1 where B1 (t) = U (t), B2 (t) = C(t), B3 (t) = A(T ) and B4 (t) = G(t) and the integrals are the average probabilities of finding a given base at a given site during the time interval (0, T ]. A measure of genetic distance is therefore given by K = 2T k (16) where k is as defined in (15), T is the time since the two sequences started diverging from the ancestral sequence and the factor of 2 is due to the fact that we are considering two branches that diverged. We now formulate the general model and proceed in a manner similar to that of Takahata and Kimura (1981). At any time t ∈ [0, T ], consider a short time interval ∆t, short enough so that if the mutation rate is small then higher order terms of ∆t and the occurrence of a double substitution at a specific site may be neglected. We have U (t + ∆t) = U (t) − α1 (∆t)U (t) + α2 (∆t)C(t) + β2 (∆t)A(t) + γ2 (∆t)U (t) − γ1 (∆t)U (t) − β1 (∆t)U (t) (17)
  • 7. DNA nucleotide substitution models 7 which we can rewrite as U (t + ∆t) − U (t) = − (α1 + β1 + γ1 ) U (t) + α2 C(t) + β2 A(t) + γ2 G(t). (18) ∆t Getting the limit as ∆t approaches zero, (18) gives dU (t) = − (α1 + β1 + γ1 ) U (t) + α2 C(t) + β2 A(t) + γ2 G(t). (19) dt Doing this for the other three probabilities we get the following system of differential equations: dU (t) = −(α1 + β1 + γ1 )U (t) + α2 C(t) + β2 A(t) + γ2 G(t) (20) dt dC(t) = α1 U (t) − (α2 + β3 + γ3 )C(t) + γ4 A(t) + β4 G(t) (21) dt dA(t) = β1 U (t) − γ3 C(t) − (α3 + β2 + γ4 )A(t) + α4 G(t) (22) dt dG(t) = γ1 U (t) + β3 C(t) + α3 A(t) − (α4 + β4 + γ2 )G(t). (23) dt Writing (20) – (23) in matrix form gives      U (t) −(α1 + β1 + γ1 ) α2 β2 γ2 U (t)           d C(t)  α1 −(α2 + β3 + γ4 ) γ4 β4  C(t) = . (24)          dt A(t)     β1 γ3 −(α3 + β2 + γ4 ) α4  A(t)        G(t) γ1 β3 α3 −(α4 + β4 + γ2 ) G(t) Using fact that the sum of the base probabilities is equal to 1, the matrix equation reduces to      U (t) −(α1 + β1 + γ1 + γ2 ) α2 − γ2 β2 − γ2  U (t) d      C(t) =  α 1 − β4 −(α2 + β3 + γ4 + β4 ) γ4 − β4  C(t) . (25) dt           A(t) β1 − α 4 γ3 − α4 −(α3 + β2 + γ4 + α4 ) A(t) which can be written as d B1 (t) = Q1 B1 (t) + C1 . (26) dt Solving this system of differential equations entails solving for the eigenvalues of B1 . Although it is easy to get the eigenvalues of the 3 × 3 matrix B1 , the matrix equation in (26) is still difficult to solve since only the final conditions of the baseprobabilities can be approximated and the initial conditions are unknown. One way to avoid this problem is to
  • 8. DNA nucleotide substitution models 8 express the base probabilities in terms of the match probabilities. The matrix equation involving the match probabilities is easier to solve since the initial conditions for the match probabilities are Pi (0) = Qi (0) = Ri (0) = 0, i = 1, . . . , 4 and S(0) = 1. After the expressions for the match probabilities have been solved, we can solve for the mean rate of base substitution k and hence the estimate of genetic distance K. Inherent in these models of evolutionary base nucleotide substitutions are the following four assumptions: (1) The two sequences diverged from a common ancestor, that is, Pi (0) = Qi (0) = Ri (0) = 0, i = 1, . . . , 4 and S(0) = 1. (2) The two sequences are stochastically identical and independent, and within each sequence, as substitution in one site in no way affects a substitution in some other site. (3) The homologous sites chosen from the two sequences are of the same fixed length during their descent from the common ancestor. (4) (The fourth assumption reduces the number of parameters in the model by assuming that some of the rates are equal. Since this differs among the three models that we are going to consider, rather than stating it here, it will be stated as each model is being considered.) The 3ST Model The first special case that we are going to consider is the three-substitution-type (3ST) model. This model is due to Kimura (1981) and is the most general of the three models we are going to consider in detail in this paper. The two other models we considerlater are special cases of this model. The fourth assumption in the 3ST model is that the TS-type substitutions all have rates α, and that the TV-type substitutions have rates β and γ depending on the specific type as shown in Figure 3. Under the 3ST model, Tables 1 and 2 can be simplified and their simplified forms are given below as Tables 3 and 4, respectively.
  • 9. DNA nucleotide substitution models 9 The system of differential equations in (20) – (23) simplifies to dU (t) = −(α + β + γ)U (t) + αC(t) + βA(t) + γG(t) (27) dt dC(t) = αU (t) − (α + β + γ)C(t) + γA(t) + βG(t) (28) dt dA(t) = βU (t) = γC(t) − (α + β + γ)A(t) + αG(t) (29) dt dG(t) = γU (t) + βC(t) + αA(t) − (α + β + γ)G(t). (30) dt and its corresponding matrix form is      U (t) −(α + β + γ) α β γ U (t)           d C(t)  α −(α + β + γ) γ β  C(t) = , (31)          dt A(t)     β γ −(α + β + γ) α  A(t)        G(t) γ β α −(α + β + γ) G(t) which again can be written in the form of (25). Considering the fact that the sum of the base probabilities is 1, we can simplify (31) to      U (t) −(α + β + 2γ) α−γ β−γ  U (t) d      C(t) =  α−β −(α + 2β + γ) γ−β  C(t) . (32) dt           A(t) β−α γ−α −(2α + β + γ) A(t) We can also rewrite (32) in the form of (25). The matrix equation in (32) is not difficult to solve since the eigenvalues are easily obtainable. The problem here is that we do not know the initial conditions for the base probabilities since we do not know the base frequencies of the ancestral sequence. As we have mentioned before, a way to avoid this problem is to consider the match probabilities instead. It is easier to use the match probabilities since we have the initial conditions for this set of probabilities given by the first assumption (A1) of our model. Using the relationships between the base probabilities and the match probabilities given in (11) – (14) it can be shown that        P (t) −2(2α + β + 2γ) −2(α − γ) −2(α − β)  P (t) 2α d        Q(t) =  −2(α − β) −2(α + 2β + γ) −2(β − α)  Q(t) + 2β  . (33) dt               R(t) −2(γ − β) −2(γ − α) −(α + β + 2γ) R(t) 2γ which in matrix form is d T(t) = Q2 T(t) + C2 . (34) dt
  • 10. DNA nucleotide substitution models 10 We now derive the expression for P (t) in (33). The expressions for Q(t) and R(t) can be obtained in very much the same manner. Recall that in (11) – (14) we have P (t) = probability of a TS-type difference at a homologous site (35) = 2C(t)U (t) + 2A(t)G(t). (36) Using the product-rule for the derivative of a product, dP (t) dU (t) dC(t) dG(t) dA(t) = 2 C(t) + U (t) + 2 A(t) + G(t) . (37) dt dt dt dt dt If we substitute the expressions for the derivatives of the match probabilities we obtained in (33) we have dP (t) = 2 {−2 (C(t)U (t) + A(t)G(t)) (α + β + γ) + 2β (A(t)C(t) + G(t)U (t)) + dt 2γ (A(t)U (t) + G(t)C(t)) + α A2 (t) + C 2 (t) + U 2 (t) + G2 (t) (38) Using the fact that A2 (t) + C 2 (t) + U 2 (t) + G2 (t) = 1- P (t) - Q(t) -R(t) we can simplify (38) to obtain dP (t) = 2 − {−(2α + β + γ)P (t) + (β − α)R(t) + (γ − α)Q(t) + 2α} (39) dt which is what we want. We now solve the matrix equation in (34). Define the following Laplace transform:     P (t) p(s)     L[T(t)] = L Q(t) = q(s) = T (s). (40)             R(t) r(s) Applying the Laplace transform to (34), we get 1 sT (s) − T(0) = Q3 T (s) + C3 (41) s which we can rewrite as 1 − C3 = (Q − sI3 )T (s), (42) s
  • 11. DNA nucleotide substitution models 11 where we have used the fact that T(0)= 0 and I3 is the 3 × 3 identity matrix. The problem of solving the system of differential equations in (34) is now reduced to solving a system of algebraic equations in the three unknowns p(s), q(s), and r(s). We now solve for these three unknowns and then apply the inverse Laplace transform to get the solutions for P (t), Q(t), and R(t). Using Cramer’s rule, we get −2α/s −2(α − γ) −2(α − β) −2β/s −2(α + 2β + γ) −2(β − α) −2γ/s −2(γ − α) −2(α + β + 2γ) − s p(s) = (43) ∆ −2(2α + β + γ) −2α/s −2(α − β) −2(β − γ) −2β/s −2(β − α) −2(γ − β) −2γ/s −2(α + β + 2γ) − s q(s) = (44) ∆ −2(2α + β + γ) − s −2(α − γ) −2α/s −2(β − γ) −2(α + 2β + γ) −2β/s −2(γ − β) −2(γα) −2γ/s r(s) = (45) ∆ where, −2(2α + β + γ) −2(α − γ) −2(α − β) ∆ = −2(β − γ) −2(α + 2β + γ) −2(β − α) . (46) −2(γ − β) −2(γ − α) −2(α + β + 2γ) Upon simplifying and expressing the results in partial fractions we get, 1 1 1 1 4 4 4 p(s) = − − + (47) 4s s + 4(α + β) s + 4(α + γ) s + 4(β + γ) 1 1 1 1 4 4 4 q(s) = − + − (48) 4s s + 4(α + β) s + 4(α + γ) s + 4(β + γ) 1 1 1 1 4 4 4 r(s) = + − − . (49) 4s s + 4(α + β) s + 4(α + γ) s + 4(β + γ)
  • 12. DNA nucleotide substitution models 12 Applying the inverse Laplace transform, we get the following as solutions to the system in (49), 1 P (t) = L−1 {p(s)} = 1 − eλ1 t − eλ2 t + eλ3 t (50) 4 1 Q(t) = L−1 {q(s)} = 1 − eλ1 t + eλ2 t − eλ3 t (51) 4 1 R(t) = L−1 {r(s)} = 1 + eλ1 t − eλ2 t − eλ3 t , (52) 4 where λ1 = −4(α+β), λ2 = −4(α+γ), λ3 = −4(β+γ). Under the 3ST model, the equation for k in (15) can be expressed as 4 T α+β+γ k = Bi (t) dt = α + β + γ, (53) T 0 i=1 where we have used the fact that the sum of the base probabilities is equal to 1. Note that the assumption on some of the rates being equal played a crucial role in being able to factor α+β+γ out of the summation to get a simple expression for k. For K, we obtain K = 2T (α + β + γ). (54) We can solve (52) for λ1 , λ2 , and λ3 to get 4(α + β)t = − ln(1 − 2P (t) − 2Q(t)) (55) 4(α + γ)t = − ln(1 − 2P (t) − 2R(t)) (56) 4(β + γ)t = − ln(1 − 2Q(t) − 2R(t)), (57) and hence, for any time t ∈ [0, T ], 8(α + β + γ)t = − ln {[1 − 2P (t) − 2Q(t)][1 − 2P (T ) − 2R(t)][1 − 2Q(t) − 2R(t)]} (58) K = 2kt (59) 1 = − ln {[1 − 2P (t) − 2Q(t)][1 − 2P (T ) − 2R(t)][1 − 2Q(t) − 2R(t)]} . (60) 4 The variance for this estimate of K is also given in the paper of Kimura (1981). We have, 2 1 2 σK = a P (t) + b2 Q(t) + c2 R(t) − (aP (t) + bQ(t) + cR(t))2 (61) n
  • 13. DNA nucleotide substitution models 13 where, 1 1 1 a = + (62) 2 1 − 2P (t) − 2Q(t) 1 − 2P (t) − 2Q(t) 1 1 1 b = + (63) 2 1 − 2P (t) − 2Q(t) 1 − 2Q(t) − 2R(t) 1 1 1 c = + . (64) 2 1 − 2P (t) − 2R(t) 1 − 2Q(t) − 2R(t) The 2ST Model We now proceed to a special case of this model which again is due to Kimura (1980). We will call this model the two-substitution type model. The two-substitution-type (2ST) was discussed by Kimura in a paper which was published a year previous to the 3ST model. The 2ST model is a special case of the 3ST model and hence we just give the results and do not gointo the details. (In the original paper, this model is actually nameless. We just call it the 2ST model for convenience). The fourth assumption here is that the transition rate is α and the transversion rate is β. Under this assumption the diagram in Figure 3 simplifies further to the diagram in Figure 4. The tables for the base substitution and the match probabilities are given as Tables 5 and 6 below. The probability of a TS-type mismatch is given by P (t) and the probability of a TV-type mismatch is given by QR(t) = Q(t)+ R(t). That is, we have lumped together the TV1-type and TV2-type mismatches. The matrix equation in (24) under the 2ST model is     U (t) −(α + 2β) α β β      C(t) α −(α + 2β) β β     d    =     (65) dt      A(t)    β β −(α + 2β) α       G(t) β β α −(α + 2β) and the corresponding matrix equation involving the match probabilities is     P (t) −2(2α + 2β) −2(α − β) −2(α − β)  d     Q(t) =  −2(α + 3β) −2(β − α)  . (66)     0 dt         R(t) 0 −2(β − α) −2(α + 3β)
  • 14. DNA nucleotide substitution models 14 If we now lump Q(t) and R(t) together as QR(t) we have the matrix equation in (67) which only involves a 2 × 2 matrix instead of the previous 3 × 3 matrix.         P (t)  −2(2α + β + γ −2(α − β)  P (t)  2α   =   +  (67) QR(t) 0 8β QR(t) 2β To solve (67), we use the initial conditions: P (0) = QR(0) = 0. As solutions we have 1 1 λ1 t 1 λ2 t P (t) = − e + e (68) 4 2 4 1 1 λ2 t QR(t) = − e (69) 2 2 where λ1 = −4(α+β) and λ2 = −8β. Under the 2ST model k = α + 2β. We can solve (69) for αt and βt and therefore obtain our estimate K. We have K = 2kt = 2(α + 2β) (70) 1 = − ln [1 − 2P (t) − QR(t)]2 [1 − 2QR(t)] . (71) 4 The variance of this estimate is given 2 1 2 σK = a P (t) + b2 QR(t) − (aP (t) + bQR(t))2 (72) n where 1 a = (73) 1 − 2P (t) − 2QR(t) 1 1 1 b = + . (74) 2 1 − 2P (t) − 2QR(t) 1 − 2QR(t) The Jukes-Cantor Model The simplest possible model is due to Jukes and Cantor (1969). The model was primarily formulated to describe protein evolution by looking at the rate of amino acid substitution. It turns out that this model can also be used to describe base substitution. The fourth assumption here is that all the rates of substitution are equal, i.e., α = αi = βi = γi , i = 1, . . ., 4. Figure 2 then becomes Figure 5 below. Under the Jukes-Cantor model, Tables 1 and 2 can be simplified to Tables 7 and 8, respectively.
  • 15. DNA nucleotide substitution models 15 The matrix equation in (24) under the Jukes-Cantor model is      U (t)  −3α α α α  U (t)      C(t)  α −3α α α  C(t)      d   =    (75) dt      A(t)    α  α −3α α   A(t)       G(t) α α α −3α G(t) and the matrix equation involving the match probabilities is        P (t) −8β 0 0  P (t) 2α d        Q(t) =  0 0  Q(t) + 2α (76)       −8β dt               R(t) 0 0 −8β R(t) 2α If we define P QR(t) = P (t) + Q(t) + R(t) we have the differential equation d P QR(t) = −8αP QR(t) + 6α (77) dt which has as a solution 3 P QR(t) = 1 − e−8αt . (78) 4 Under the Jukes-Cantor model, k = 3α and the estimate K is 3 4 K = 2kt = 6αt = − ln(1 − P QR(t)) (79) 4 3 which can be obtained by solving for α in (78). The variance for K under the Jukes-Cantor model was derived by Kimura and Ohta (1972) and is given by 2 1 (1 − P QR(t))P QR(t) (1 − P QR(t))P QR(t) σJC = = . (80) n 1 − 4P QR(t)/3 n(1 − 4P QR(t)/3) We are going to illustrate the three models by comparing the human and protein kinase inhibitor. These two nucleotide sequences were recently sequenced by Olsen and Uhler (1991). The sequences are more than a thousand base pairs long but only 231 of these are part of the coding region. Our analysis is limited to these 231 base pairs. The sequences are shown in Figure 6. Of the 231 bp, only 15 show mismatches. These are
  • 16. DNA nucleotide substitution models 16 summarized in Table 9. Usually, the estimate K is computed by codon position since there is that assumption that the substitution are independent of each other but there is evidence that adjacent substitutions are actually not independent of each other. This will not be done here since we have quite a small amount of base pairs and the mismatches are quite far apart (except for the ones occurring at positions 200 and 201). The estimate under each model is shown in Table 10. It is seen here that the estimates do not differ so much from one model to the other. The variances are also not that different from each other. Estimates of genetic distance using some other nucleotide sequences are also available. Tavar (1986) obtained estimates using human and mouse a-fetoprotein and serum albumin nucleotide sequences. The results he got for the human-mouse α-fetoprotein nucleotide sequences are reproduced below as Table 11. The data consist of 1824 base pairs and hence it was possible for him to compute the estimates by codon positions. Note that the estimates tend to be bigger for the third codon position and smallest for the second codon position. Tavar in his paper showed that the estimates are not homogeneous if we consider the codon positions as strata. Unfortunately, we cannot do the same thing in our analysis here since we just have 231 bp and 15 mismatches. All three models of evolutionary base substitutions that we have discussed here are far from perfect and their weaknesses lie on the second and third assumptions made to formulate the models. The second assumption states that the nucleotide sequences are stochastically identical and independent of each other. It is most possibly true that nucleotide sequences evolve in a manner stochastically independent from each other but there are evidences that they are in fact not stochastically identical. For example, Wu and Li (1985) noticed that the substitution rates in rodent is much higher than that in humans. Even within a sequence, there is evidence that that rates are much higher in some spots (“hot spots”) than in others (Miyata and Yasunaga, 1981; Brown and Clegg, 1983) and that the rates differ between the sense and antisense strand (Wu and Maeda, 1987). There are also evidences showing that a substitution in one site does a affect the rate of substitution in an adjacent site in phage T4 (Koch, 1971). It would be interesting to know if the same
  • 17. DNA nucleotide substitution models 17 holds for higher organisms. This last fact is also one of the reasons why substitution rates are computed by codon sites if the data allow. The third assumption assumes that the diverging nucleotide sequences are both of a fixed length and hence it doesn’t take into account mutations resulting from deletions and insertions. These assumption also does not take into account the possibility of concerted evolution, which brings about the presence of multigene families, and the duplication and divergence in multigene families. There have been efforts to consider models which incorporate these shortcomings but at the same time still make the models mathematically tractable. Needleman and Wunsch (1970), for example, proposed a model which assigns weights to substitutions, insertions and deletions. Unfortunately, the weights assigned were arbitrary and had no genetic basis. The main problem that these models of evolutionary base nucleotide substitution face is that when all of the mechanisms of evolution are included in the model, the model becomes mathematically intractable with the present computer technology. Considering the fact that computer technology is still advancing, it is hoped that a model incorporating most, if not all, of the mechanisms discussed can be formulated in the near future.
  • 18. DNA nucleotide substitution models 18 References Brown, A., & Clegg, M. (1983). Analysis of variation in related DNA sequences. In B. Weir (Ed.), Statistical data analysis (pp. 107–132). New York: Marcel-Dekker. Cavalli-Sforza, L., & Bodmer, W. (1971). The genetics of human populations. San Francisco: W. H. Freeman. Cavalli-Sforza, L., & Edwards, A. (1967). Phylogenetic analysis: models and estimation procedures. American Journal of Human Genetics, 19 , 233–257. Edwards, A. (1971). The distance between populations on the basis of gene frequencies. Biometrics, 27 , 873–881. Jukes, T., & Cantor, C. (1969). Evolution of protein molecules. In H. N. Munro (Ed.), Mammalian protein metabolism (pp. 21–123). New York: Academic Press. Kimura, M. (1980). A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution, 16 , 11–120. Kimura, M. (1981). Estimation of evolutionary distances between homologous nucleotide sequences. Proceedings of the National Academy of Sciences USA, 78 , 454–458. Kimura, M., & Ohta, T. (1972). On the stochastic model for estimation of mutational distance between homologous proteins. Journal of Molecular Evolution, 2 , 87–90. Koch, R. (1971). The influence of neighbouring base pairs upon base-pair substitution mutation rates. Proceedings of the National Academy of Sciences USA, 68 , 773–776. Maxam, A., & Gilbert, W. (1977). A new method for sequencing DNA. Proceedings of the National Academy of Sciences USA, 74 , 560–564. Miura, R. (Ed.). (1986). Lectures on mathematics in the life sciences. Rhode Island: American Mathematical Society. Miyata, T., & Yasunaga, T. (1981). Rapidly evolving mouse α-globin-related pseudogenes. Proceedings of the National Academy of Sciences USA, 78 , 450–453.
  • 19. DNA nucleotide substitution models 19 Munro, H. N. (Ed.). (1969). Mammalian protein metabolism. New York: Academic Press. Needleman, S., & Wunsch, C. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48 , 443–453. Nei, M. (1977). F-statisitcs and analysis of gene diversity in subdivided populations. Annals of Human Genetics, 41 , 225–233. Olsen, S., & Uhler, M. (1991a). (nucleotide sequence of the human protein kinase inhibitor). Molecular Endocrinology. (manuscript submitted) Olsen, S., & Uhler, M. (1991b). (nucleotide sequence of the mouse protein kinase inhibitor). Journal of Biological Chemistry. (in press) Sanger, F., Nicklen, S., & Coulson, A. (1977). DNA sequencing with chain-terminating inhibitors. Proceedings of the National Academy of Sciences USA, 74 , 4563–4567. Takahata, N., & Kimura, M. (1981). A model of evolutionary base substitutions and its application with special reference to rapid change in pseudogenes. Genetics, 98 , 641–657. Tavar´, S. (1986). Some probabilistic and statistical problems in the analysis of DNA e sequences. In R. Miura (Ed.), Lectures on mathematics in the life sciences (pp. 57–86). Rhode Island: American Mathematical Society. Weir, B. (Ed.). (1983). Statistical data analysis. New York: Marcel-Dekker. Weir, B. (1990). Genetic data analysis: methods for discrete population data. Sunderland, Massachussetts: Sinauer Associates. Wu, C., & Li, W. (1985). Evidence for higher rates of nucleotide substitution in rodents than in man. Proceedings of the National Academy of Sciences USA, 82 , 1741–1745. Wu, C., & Maeda, N. (1987). Inequality in mutation rates of the two strands of DNA. Nature, 327 , 169–170.
  • 20. DNA nucleotide substitution models 20 Table 1 Types and rates of nucleotide sustitution. Types Transition (TS) Transversion (TV1) Transversion (TV2) Initial base U C A G U A C G U G C A New Base C U G A A U G C G U A C Rates α1 α2 α3 α4 β1 β2 β3 β4 γ1 γ2 γ3 γ4
  • 21. DNA nucleotide substitution models 21 Table 2 Possible nucleotide base pairings at a specific homologius site for t > 0. Types Sequence Same TS-type TV1-type TV2-type 1 U C A G U C A G U A C G U G C A 2 U C A G C U G A A U G C G U A C Probabilities S1 S2 S3 S4 P1 P2 P3 P4 Q1 Q2 Q3 Q4 R1 R2 R3 R4
  • 22. DNA nucleotide substitution models 22 Table 3 Types and rates of nucleotide sustitution under the 3ST model. Types Transition (TS) Transversion (TV1) Transversion (TV2) Initial base U C A G U A C G U G C A New Base C U G A A U G C G U A C Rates α α α α β β β β γ γ γ γ
  • 23. DNA nucleotide substitution models 23 Table 4 Possible nucleotide base pairings at a specific homologius site for t > 0 under the 3ST model. Types Sequence Same TS-type TV1-type TV2-type 1 U C A G U C A G U A C G U G C A 2 U C A G C U G A A U G C G U A C Probabilities S P Q R
  • 24. DNA nucleotide substitution models 24 Table 5 Types and rates of nucleotide sustitution under the 2ST model. Types Transition (TS) Transversion (TV1) Transversion (TV2) Initial base U C A G U A C G U G C A New Base C U G A A U G C G U A C Rates α α α α β β β β β β β β
  • 25. DNA nucleotide substitution models 25 Table 6 Possible nucleotide base pairings at a specific homologius site for t > 0 under the 2ST model. Types Sequence Same TS-type TV1-type TV2-type 1 U C A G U C A G U A C G U G C A 2 U C A G C U G A A U G C G U A C Probabilities S P QR
  • 26. DNA nucleotide substitution models 26 Table 7 Types and rates of nucleotide sustitution under the Jukes-Cantor model. Types Transition (TS) Transversion (TV1) Transversion (TV2) Initial base U C A G U A C G U G C A New Base C U G A A U G C G U A C Rates α α α α α α α α α α α α
  • 27. DNA nucleotide substitution models 27 Table 8 Possible nucleotide base pairings at a specific homologius site for t > 0 under the Jukes- Cantor model. Types Sequence Same TS-type TV1-type TV2-type 1 U C A G U C A G U A C G U G C A 2 U C A G C U G A A U G C G U A C Probabilities S P QR
  • 28. DNA nucleotide substitution models 28 Table 9 Nucleotide mismatches observed after time T since divergence between human and mouse protein kinase inhibitor (pki). Types Transition (TS) Transversion (TV1) Transversion (TV2) Human pki U C A G U A C G U G C A Mouse pki C U G A A U G C G U A C Numbers observed 5 0 3 2 0 1 1 6 0 1 1 2
  • 29. DNA nucleotide substitution models 29 Table 10 Estimates of the genetic distance K under the different models being considered. Model K standard error Jukes-Cantor 0.0682288 0.0178312 2ST 0.0686475 0.0180611 3ST 0.0686535 0.0180644
  • 30. DNA nucleotide substitution models 30 Table 11 Estimates of the genetic distance Ki , where i = 1, 2, or 3, is the ith codon position, under the different models considered in Tavar´ (1986). The sequence data are that of human and e mouse α-fetoprotein. Model K1 K2 K3 Jukes-Cantor 0.1752 (.0186) 0.1387 (.0162) .6566 (.0483) 3ST 0.1760 (.0188) 0.1389 (.0163) .7230 (.0642) (The parenthesized quantities are standard errors.)
  • 31. DNA nucleotide substitution models 31 Figure Captions Figure 1. Divergence of sequences S1 and S2 from some common ancestor. Figure 2. Types and rates of nucleotide substitutions. Figure 3. Types and rates of nucleotide substitutions: 3ST Model. Figure 4. Types and rates of nucleotide substitutions: 2ST Model. Figure 5. Types and rates of nucleotide substitutions: Jukes-Cantor Model. Figure 6. The nucleotide sequences of the coding region of the mouse protein kinase inhibitor (Mpki.M) and the human protein kinase inhibitor (Hpki.2) are shown above. The 15 mismatches are indicated with bars (Olsen and Uhler, 1991a, 1991b).
  • 32. Ancestral sequence ¢f ¢ f ¢ f ¢ f ¢ f T¢ fT ¢ f ¢ f ¢ x f S1 S2
  • 33. α1 ' U E C s d α2   T γ d d1 γ3   T   d d     β1 β2 d  d  β3 β4 d   d       d d  γ  d d γ2 4   d d c   α3 d c '©   ‚ A E G α4
  • 34. α ' U E C s d α   T d dγ γ   T   d d     β β d d   β β       d d     d d  γ  d dγ   d d c   α c '©   ‚ d A E G α
  • 35. α ' U E C s d α   T d dβ β   T   d d     β β d d   β β       d d     d d  β  d dβ   d d c   α c '©   ‚ d A E G α
  • 36. α ' U E C s d α   T d dα α   T   d d     α α d  d  α α d   d       d d  α  d dα   d d c   α c '©   ‚ d A E G α