SlideShare une entreprise Scribd logo
1  sur  5
Télécharger pour lire hors ligne
Proof of Kraft-McMillan theorem∗
                                                                            Nguyen Hung Vu†

                                                                          平成 16 年 2 月 24 日


1 Kraft-McMillan theorem                                                                         This means that

   Let C be a code with n codewords with lenghth l1 , l2 , ..., lN . If                                           nl
                                                                                                                  X                  nl
                                                                                                                                     X
C is uniquely decodable, then                                                                        K(C)n =            Ak 2−k ≤           2k 2−k = nl − n + 1       (3)
                                                                                                                  k=n                k=n
                                        N
                                        X
                            K(C) =             2−li ≤ 1                            (1)          But K(C) is greater than one, it will grow exponentially with
                                        i=1
                                                                                             n, while n(l − 1) + 1 can only grow linearly, So if K(C) is greater
                                                                                             than one, we can always find an n large enough that the inequal-
                                                                                             ity (3) is violated. Therefore, for an uniquely decodable code C,
                                                                                             K(C) is less than or equal ot one.
2 Proof                                                                                         This part of of Kraft-McMillan inequality provides a necces-
   The proof works by looking at the nth power of K(C). If K(C)                              sary condition for uniquely decodable codes. That is, if a code
is greater than one, then K(C)n should grw expoentially with n.                              is uniquely decodable, the codeword lengths have to satisfy the
If it does not grow expoentially with n, then this is proof that                             inequality.
PN −li
   i=1 2     ≤ 1.
   Let n be and arbitrary integer. Then
              hP             in
                  N      −li                                                                 3      Construction of prefix code
                  i=1 2         =
                “P                 ”
              =       N       −li1                                                               Given a set of integers l1 , l2 , · · · , lN that satisfy the inequality
                      i1 =1 2        ···
              “P               ” “P               ”                                                                          N
                  N       −li2        N      −lin                                                                            X
                  i2 =1 2             iN =1 2                                                                                      2−li ≤ 1                          (4)
  and then                                                                                                                   i=1

      "N               #n       N N                 N
                                                                                             we can always find a prefix code with codeword lengths
        X        −li
                                X X                 X                                        l1 , l2 , · · · , lN .
                2           =                 ···           2−(li1 +i2 +···+in )   (2)
         i=1                    i1 =1 i2 =1         in =1


  The exponent li1 + li2 + · · · + lin is simply the length of n
codewords from the code C. The smallest value this exponent                                  4      Proof of prefix code constructing
can take is greater than or equal to n, which would be the case if
codwords were 1 bit long. If                                                                        theorem
                                                                                                We will prove this assertion by developing a procedure for con-
                        l = max{l1 , l2 , · · · , lN }                                       structing a prefix code with codeword lengths l1 , l2 , · · · , lN that
  then the largest value that the exponent can take is less than or                          satisfy the given inequality. Without loss of generality, we can
equal to nl. Therefore, we can write this summation as                                       assume that
                                                                                                                    l1 ≤ l2 ≤ · · · ≤ lN .                      (5)
                                          nl
                                          X                                                  Define a sequence of numbers w1 , w2 , · · · , wN as follows:
                            K(C)n =                 Ak 2−k
                                              n
                                                                                                             w1 = 0
                                                                                                                       Pj−1 lj −li
   where Ak is the number of combinations of n codewords that                                                wj =         i=1 2           j > 1.
have a combined length of k. Let’s take a look at the size of this                               The binary representation of wj for j > 1 would take up
coeficient. The number of possibledistinct binary sequences of                                  log2 wj bits. We will use this binary representation to construct
length k is 2k . If this code is uniquely decodable, then each se-                           a prefix code. We first note that the number of bits in the binary
quence can represent one and only one sequence of codewords.                                 representation of wj is less than or equal to lj . This is obviously
Therefore, the number of possible combinations of codewords                                  true for w1 . For j > 1,
whose combined length is k cannot be greater than 2k . In other
                                                                                                                               hP               i
words,                                                                                                     log2 wj = log2          j−1 lj −li
                                                                                                                                        2
                                                                                                                               h i=1P             i
                                     Ak ≤ 2k .                                                                        = log2 2lj j−1 2−li
                                                                                                                                       i=1
                                                                                                                                    hP              i
                                                                                                                                         j−1 −li
   ∗ Introduction  to Data Compression, Khalid Sayood                                                                 = lj + log2        i=1 2
   † email:   vuhung@fedu.uec.ac.jp                                                                                      ≤    lj .


                                                                                         1
The last inequality results from the hypothesis of the theo-
          P                                        P                           i2 , · · · , Xn = in )P And second–order entropy is defined
                                                                                                     .
rem that N 2−li ≤ 1, which implies that j−1 2li ≤ 1.
              i=1                                     i=1                      by H2 (S) P − P (X1 ) log P (X1 ). In case of iid,
                                                                                               =
                                                                                               i1 =m
As the logarithm of a˜number less than one is negative, lj +
     ˆP                                                                        Gn = −n i1 =1 P (X1 = i1 ) log P (X1 = i1 ). Prove it!
                j−1 −li
log2     i=1 2            has to be less than lj .                          3. Given an alphabet A = {a1 , a2 , a3 , a4 }, find the first–order
   Using the binary representation of wj , we can devise a binary              entropy in the following cases:
code in the following manner: If log2 wj = lj , then the jth
codeword cj is the binary representation of wj . If log2 wj < lj ,               (a) P (a1 ) = P (a2 ) = P (a3 ) = P (a4 ) = 1 .
                                                                                                                             4
then cj is the binary representation of wj , with lj − log2 wj                   (b) P (a1 ) = 1 , P (a2 ) = 1 , P (a3 ) = P (a4 ) = 1 .
                                                                                               2             4                       8
zeros appended to the right. This certainly a code, but is it a prefix                                                  1                 1
code? If we can show that the code C = {c1 , c2 , · · · , cN } is a              (c) P (a1 ) = 0.505, P (a2 ) =        4
                                                                                                                         , P (a3 )   =   8
                                                                                                                                           , P (a4 )   =
prefix, then we will have proved the theorem by construction.                         0.12.
   Suppose that our claim is not true.. The for some j < k, cj is a         4. Suppose we have a source with a probability model P =
refix of ck . This means that lj most significant bits of wk form the            {p0 , p1 , · · · , pm } and entropy HP . Suppose we have an-
binary representation of wj . Therefore if we right–shift the binary           other sourcewith probability model Q = {q0 , q1 , · · · qm }
of wk by lk − lj bits, we should get the binary representation for             and entroy HQ , where
wj . We can write this as
                                                                                     qi = pi         i = 0, 1, · · · , j − 2, j + 1, · · · , m
                                              wk
                               wj =                   .
                                          2lk −lj                              and
  However,                                                                                                           pj +pj−1
                                                                                                     qj = qj−1 =         2
                                        k−1
                                        X                                      How is HQ related to HP ( greater, equal, or less)? Prove
                              wk =            2lk −li .                        your answer.
                                        i=1                                                              P                P
                                                                               Hints: HQ − HP =              pi log pi −     qi log qi =
  Therefore,                                                                                                    pj−1 +pj    p     +p
                                                                               pj−1 log pj−1 + pj log pj − 2        2
                                                                                                                         log j−1 j =
                                                                                                                                 2
                                                                                                       pj−1 +pj
                                                                               φ(pj−1 ) + φ(pj ) − 2φ(    2
                                                                                                                ). Where φ(x) = x log x
                                  k−1
                                  X                                            and φ (x) = 1/x > 0 ∀x > 0.
                      wk
                              =         2lj −li
                    2lk −lj       i=0
                                                                            5. There are several image and speech files among the accom-
                                          k−1
                                                                               panying data sets.
                                          X
                              = wj +              2lj −li                        (a) Write a program to compute the first–order entropy of
                                          i=j                         (6)
                                                                                     some of the image and speech files.
                                                    k−1
                                                    X                            (b) Pick one of the image files and compute is second–
                              = wj + 20 +                   2lj −lk
                                                   i=j+1
                                                                                     order entropy. Comment on the difference between
                                                                                     the first–order and second–order entropies.
                              ≥ wj + 1
                                                                                 (c) Compute the entropy of the difference between neigh-
                                     wk                                              boring pixels for the image you used in part (b). Com-
   That is, the smalleset value for lk −lj is wj + 1. This constra-
                                   2
dicts the requirement for cj being the prefix of ck . Therefore, cj                   ment on what you discovered.
cannot be the prefix of ck . As j and k were arbitrary, this means           6. Conduct an experiemnt to see how well a model can describe
that no codword is a prefix of another codeword, and the code C                 a source.
is a prefix code.
                                                                                 (a) Write a program that randomly selects letters from
                                                                                     a 26–letter alphabet {a, b, c, · · · , z} and forms four–
                                                                                     letter words. Form 100 such words and see how many
5 Projects and Problems                                                              of these words make sence.
                                                                                 (b) Among the accompanying data sets is a file called
  1. Suppose X is a random variable that takes on values from an
                                                                                     4letter.words , which contains a list of four–
     M–letter alphabet. Show that 0 ≤ H(X) ≤ log2 M .
                          P                                                          letter words. Using this file, obtain a probability
     Hints: H(X) = − n P (αi ) log P (αi ) where, M =
                            i=1                                                      model for the alphabet. Now repeat part (a) gener-
     {α1 × m1 , α2 × m2 , · · · , αn × mP αi = αj ∀i =
        Pn                                 n },                                      ating the words using probability model. To pick let-
     j, i=1 mi = M . H(X) = − n PMi log mi =
          Pn                                  i=1
                                                  m
                                                          M                          ters according to a probability model, construct the
        1                                     1
     − M i=1 mi (log mi − log M ) = − M            mi log mi +                       cumulative density function (cdf )FX (x)2 . Using a
      1
               P           1
                              P
        log M     mi = − M        mi log mi + log M                                  uniform pseudorandom number generator to generate
     M
                                                                                     a value r, where 0 ≤ r < 1, pick the letter xk if
  2. Show that for the case where the elements of an observed                        FX (xk − 1) ≤ r < FX (xk ). Compare your results
     sequence are idd 1 ,the entropy is equal to the first-order                      with those of part (a).
     entropy.
                                                                                 (c) Repeat (b) using a single–letter context.
     Hints:       First–order    entropy     is  defined     by:
     H1 (S)       =                1
                           limn→∞ n Gn ,    where Gn         =                   (d) Repeat (b) using a two–letter context.
         P       P            P
     − i1 =m i2 =m · · · in =1 in = mP (X1
           i1 =1   i2 =1                                     =              7. Determine whether the following codes are uniquely decod-
     i1 , X2 = i2 , · · · , Xn = in ) log P (X1 = i1 , X2 =                    able:
   1 iid:   independent and identical distributed                           2 cumulative   distribution function (cdf ) FX (x) = P (X ≤ x)
(a)   {0, 01, 11, 111}( 01 11 = 0 111)                             In this alphabet, a3 and a4 are the two letters at the bottom of
       (b)   {0, 01, 110, 111} (01 110 = 0 111 0)                      the sorted list. We assign their codewords as
       (c)   {0, 10, 110, 111}, Yes. Prove it!
                                                                                               c(a3 )     =         α2 ∗ 0
       (d)   {1, 10, 110, 111}, (1 10 = 110)
                                                                                               c(a4 )     =         α2 ∗ 1
  8. Using a text file compute the probabilities of each letter pi .
                                                                         but c(a4 ) = α1 . Therefore
       (a) Assume that we need a codeword of length log1 pi
                                                            2
           to encode the letter i. Determine the number of bits
           needed to encode the file.                                                              α1 = α2 ∗ 1
       (b) Compute the conditional probability P (i/j) of a let-         which means that
           ter i given that the previous letter is j. Assume that
                            1
           we need log P (i/j) to represent a letter i that fol-                              c(a4 )    =       α2 ∗ 10
                         2
           lows a letter j. Determine the number of bits needed                               c(a5 )    =       α2 ∗ 11.
           to encode the file.                                             At this stage, we again define a new alphabet A that consists
                                                                       of three letters a1 , a2 , a3 , where a3 is composed of a3 and a4 and
                                                                       has a probability P (a3 ) = P (a3 ) + P (a4 ) = 0.4. We sort this
6 Huffman coding                                                       new alphabet in descending order to obtain the following table.

                                                                                      The reduced four–letter alphabet
6.1 Definition                                                                         Letter Probability Codewords
   A minimal variable-length character encoding based on the fre-                     a2         0.4             c(a2 )
quency of each character. First, each character becomes a trivial                     a3         0.4                α2
tree, with the character as the only node. The character’s fre-                       a1         0.2             c(a1 )
quency is the tree’s frequency. The two trees with the least fre-
quencies are joined with a new root which is assigned the sum
                                                                         In this case, the two least probable symbols are a1 and a3 .
of their frequencies. This is repeated until all characters are in
                                                                       Therefore,
one tree. One code bit represents each level. Thus more frequent
characters are near the root and are encoded with few bits, and
                                                                                               c(a3 )     =         α3 ∗ 0
rare characters are far from the root and are encoded with many
                                                                                               c(a1 )     =         α3 ∗ 1
bits.
                                                                         But c(a3 ) = α2 . Therefore,

6.2 Example of Huffman encoding design                                                            α2 = α3 ∗ 0
                                                                         which means that.
   Alphabet A = {a1 , a2 , a3 , a4 } with P (a1 ) = P (a2 ) = 0.2
and P (a4 ) = P (a5 ) = 0.1. The entropy of this source is
                                                                                             c(a3 )     =       α3 ∗ 00
2.122bits/symbol. To design the Huffman code, we first sort the
                                                                                             c(a4 )     =       α4 ∗ 010
letters in a descending probability order as shown in table below.
                                                                                             c(a5 )     =       α5 ∗ 011.
Here c(ai ) denotes the codewords as
                                                                          Again we define a new alphabet, this time with only two let-
                      c(a4 ) = α1 ∗ 0                                  ter a3 , a2 . Here a3 is composed of letters a3 and a1 and has
                      c(a5 ) = α1 ∗ 1                                  probability P (a3 ) = P (a3 ) + P (a1 ) = 0.6. We now have the
  where α1 is a binary string, and * denotes concatnation.             following table
                 The initial five–letter alphabet
                Letter Probability Codeword                                           The reduced four–letter alphabet
                a2           0.4            c(a2 )                                    Letter Probability Codewords
                a1            0.2           c(a1 )                                    a3         0.6                α3
                a3            0.2           c(a3 )                                    a2         0.4             c(a2 )
                a4            0.1           c(a4 )
                a5            0.1           c(a5 )                        As we have only two letters, the codeword assignment is
                                                                       straghtforward:
   Now we define a new alphabe A with a four–letter
a1 , a2 , a3 , a4 where a4 is composed of a4 and a5 and has a proba-                             c(a3 )         =     0
bility P (a4 ) = P (a4 ) + P (a5 ) = 0.2. We sort this new alphabet                              c(a2 )         =     1
in descending order to obtain following table.                           which means that α3 = 0, which in turn means that
                The reduced four–letter alphabet
                Letter Probability Codewords                                                   c(a1 )       =       01
                a2         0.4             c(a2 )                                              c(a3 )       =       000
                a1         0.2             c(a1 )                                              c(a4 )       =       0010
                a3         0.2             c(a3 )                                              c(a5 )       =       0011
                a4         0.2                α1                         and the Huffman finally is given here.
The reduced four–letter alphabet                         6.6 Extended Huffman code
               Letter Probability Codewords
               a2         0.4                 1                            Consider an alphabet A = {a1 , a2 , · · · , am }. Each element of
               a1         0.2                01                         the sequence is generated independently of the other elements in
               a3         0.2               000                         the sequence. The entropyo f this source is give by
               a4         0.1              0010                                                       N
                                                                                                      X
               a5         0.1              0011                                          H(S) = −            P (ai ) log2 P (ai )
                                                                                                       i=1

                                                                          We know that we can generate a Huffman code for this source
6.3 Huffman code is optimal                                             with rate r such that

   The neccessary conditions for an optimal variable binary code                              H(S) ≤ R < H(S) + 1                            (8)
are as follows:                                                            We have used the looser bound here; the same argument can
                                                                        be made with the tighter bound. Notice thatwe have used ”rate
  1. Condition 1: Give any two letter aj and ak , if P (aj ) ≥
                                                                        R” to denote the number of bits per symbol. This is a standard
     P (ak ), then lj ≤ lk , where lj is the number of bits in the
                                                                        convention in the data compression literate. However, in the co-
     codeword for aj .
                                                                        munication literature, the word ”rate” ofter refers to the number
  2. Condition 2: The two least probable letters have codewords         of bits per second.
     with the same maximum length lm .                                     Suppose we now encode the sequence by generating one code-
                                                                        word for every n symbols. As there are mn combinations of
  3. Condition 3: In the tree corresponding to the optimum code,
                                                                        n symbols, we will need mn codewords in our Huffman code.
     there must be two branches stemming3 from each interme-
                                                                        We could generate this code by viewing mn symbols as let-
     diate node.                                                                                                              ntimes
                                                                                                                             z }| {
  4. Condition 4: Suppose we change an intermediate node into           ters of an extended alphabet. A(n) = {a1 a1 ...a1 , a1 a1 ...a2,
     a leaf node by combining all the leaves descending from            ..., a2, a1 a1 , ...am , a1 a1 ...a2 a1, ..., am am ...am } from a source
     it into a composite word of a reduced alphabet. Then, if           S (n) . Denote the rate for the new source as R(n) . We know
     the original tree was optimal for the original alphabet, the
                                                                                        H(S (n) ) ≤ R(n) < H(S (n) ) + 1.                    (9)
     reduced tree is optimal for reduced alphabet.

   The Huffman code satisfies all conditions above and therefore                                              1 (n)
                                                                                                     R=        R
be optimal.                                                                                                  n
                                                                          and

                                                                                        H(S (n) )     H(S (n) )   1
                                                                                                  ≤R<           +
6.4 Average codeword length                                                                 n           n         n
                                                                          It can be proven that
  For a source S with alphabet A = {a1 , a2 , · · · , aK }, and prob-
ability model {P (a1 ), P (a2 ), · · · , P (aK )}, the average code-                             H(S (n) ) = nH(S)
word length is given by
                                                                        and therefore
                                                                                                                          1
                              K
                              X                                                              H(S) ≤ R ≤ H(S) +
                         ¯=                                                                                               n
                         l          P (ai )li .
                              i=1

  The difference between the entropy ofthe sourcce H(S) and the         6.6.1    Example of Huffman Extended Code
average length is
                                                                           A = {a1 , a2 , a3 } with probabilities model P (a1 ) =
                                                                        0.8, P (a2 ) = 0.02 and P (a3 ) = 0.18. The entropy for this
                      P                          P
 H(S) − ¯ =
        l           − K P (ai )h 2 P (ai ) i” K P (ai )li
                                    log         − i=1                   source is 0.816 bits per symbol. Huffman code this this source is
                    P i=1 “            1                                shown below
               =       P (ai ) log P (ai ) − li
                    P          “ h         i            ”
               =       P (ai ) log P (ai ) − log2 (2li )
                                       1                                                                a1      0
                    P             h −l i                                                                a2     11
                                   2 i
               =       P (ai ) log P (ai )
                         ˆP l ˜                                                                         a3     10
               ≤    log2      2i
                                                                        and the extended code is

                                                                                            a1 a1    0.64                0
6.5 Length of Huffman code                                                                  a1 a2    0.016           10101
                                                                                            a1 a3    0.144              11
  It has been proven that                                                                   a2 a1    0.016          101000
                                                                                            a2 a2    0.0004       10100101
                    H(S) ≤ ¯ < H(S) + 1
                           (l)                                   (7)                        a2 a3    0.0036        1010011
                                                                                            a3 a1    0.1440            100
where H(S) is the entropy of the source S.
                                                                                            a3 a2    0.0036       10100100
   3 生じる                                                                                    a3 a3    0.0324           1011
6.7 Nonbinary Huffman codes

6.8 Adaptive Huffman Coding

7 Arithmetic coding
  4
      Use the mapping

                        X(ai ) = i,       ai ∈ A                (10)
where A = {a1 , a2 , ...am } is the alphabet for a discrete source
and X is a random variable. This mapping means that given a
probability model mathcalP for the source, we also have a prob-
ability density function for the random variable

                          P (X = i) = P (ai )
and the curmulative density function can be defined as
                                  i
                                  X
                      FX (i) =          P (X = k).
                                  k=1
        ¯
  Define TX (ai ) as
                          i−1
                          X
             ¯                                    1
             TX (ai ) =         P (X = k) +         P (X = i)   (11)
                                                  2
                          k=1
                                           1
                      = FX (i − 1) +         P (X = i)          (12)
                                           2
              ¯
For each ai , TX (ai ) will have a unique value. This value can be
used as a unique tag for ai . In general
                                 X                1
                   ¯(m)
                   TX (xi ) =           P (y) +     P (xi )     (13)
                                 y<xi
                                                  2




   4 Arithmetic:   算数, 算術

Contenu connexe

Tendances

communication system Chapter 4
communication system Chapter 4communication system Chapter 4
communication system Chapter 4
moeen khan afridi
 
PSK (PHASE SHIFT KEYING )
PSK (PHASE SHIFT KEYING )PSK (PHASE SHIFT KEYING )
PSK (PHASE SHIFT KEYING )
vijidhivi
 
Butterworth filter design
Butterworth filter designButterworth filter design
Butterworth filter design
Sushant Shankar
 

Tendances (20)

Information theory
Information theoryInformation theory
Information theory
 
Error Control coding
Error Control codingError Control coding
Error Control coding
 
Source coding
Source coding Source coding
Source coding
 
communication system Chapter 4
communication system Chapter 4communication system Chapter 4
communication system Chapter 4
 
PSK (PHASE SHIFT KEYING )
PSK (PHASE SHIFT KEYING )PSK (PHASE SHIFT KEYING )
PSK (PHASE SHIFT KEYING )
 
Convolution Codes
Convolution CodesConvolution Codes
Convolution Codes
 
Butterworth filter design
Butterworth filter designButterworth filter design
Butterworth filter design
 
Matched filter
Matched filterMatched filter
Matched filter
 
Chapter 03 cyclic codes
Chapter 03   cyclic codesChapter 03   cyclic codes
Chapter 03 cyclic codes
 
M ary psk modulation
M ary psk modulationM ary psk modulation
M ary psk modulation
 
Folded dipole antenna
Folded dipole antennaFolded dipole antenna
Folded dipole antenna
 
Companding and DPCM and ADPCM
Companding and DPCM and ADPCMCompanding and DPCM and ADPCM
Companding and DPCM and ADPCM
 
linear codes and cyclic codes
linear codes and cyclic codeslinear codes and cyclic codes
linear codes and cyclic codes
 
Antenna synthesis
Antenna synthesisAntenna synthesis
Antenna synthesis
 
Decimation and Interpolation
Decimation and InterpolationDecimation and Interpolation
Decimation and Interpolation
 
Introduction to Adaptive filters
Introduction to Adaptive filtersIntroduction to Adaptive filters
Introduction to Adaptive filters
 
Chap 5
Chap 5Chap 5
Chap 5
 
Frequency modulation
Frequency modulationFrequency modulation
Frequency modulation
 
Reflection and Transmission coefficients in transmission line
Reflection and Transmission coefficients in transmission lineReflection and Transmission coefficients in transmission line
Reflection and Transmission coefficients in transmission line
 
Differential pulse code modulation
Differential pulse code modulationDifferential pulse code modulation
Differential pulse code modulation
 

Similaire à Proof of Kraft-McMillan theorem

IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
Analisis Korespondensi
Analisis KorespondensiAnalisis Korespondensi
Analisis Korespondensi
dessybudiyanti
 
Chapter 4 dis 2011
Chapter 4 dis 2011Chapter 4 dis 2011
Chapter 4 dis 2011
noraidawati
 
11 ma1aexpandednoteswk3
11 ma1aexpandednoteswk311 ma1aexpandednoteswk3
11 ma1aexpandednoteswk3
dinhmyhuyenvi
 
Meanshift Tracking Presentation
Meanshift Tracking PresentationMeanshift Tracking Presentation
Meanshift Tracking Presentation
sandtouch
 

Similaire à Proof of Kraft-McMillan theorem (20)

Proof of Kraft Mc-Millan theorem - nguyen vu hung
Proof of Kraft Mc-Millan theorem - nguyen vu hungProof of Kraft Mc-Millan theorem - nguyen vu hung
Proof of Kraft Mc-Millan theorem - nguyen vu hung
 
Roots of unity_cool_app
Roots of unity_cool_appRoots of unity_cool_app
Roots of unity_cool_app
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
The Probability that a Matrix of Integers Is Diagonalizable
The Probability that a Matrix of Integers Is DiagonalizableThe Probability that a Matrix of Integers Is Diagonalizable
The Probability that a Matrix of Integers Is Diagonalizable
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
How good are interior point methods? Klee–Minty cubes tighten iteration-compl...
How good are interior point methods? Klee–Minty cubes tighten iteration-compl...How good are interior point methods? Klee–Minty cubes tighten iteration-compl...
How good are interior point methods? Klee–Minty cubes tighten iteration-compl...
 
Analisis Korespondensi
Analisis KorespondensiAnalisis Korespondensi
Analisis Korespondensi
 
one + 1
one + 1one + 1
one + 1
 
Chapter 4 dis 2011
Chapter 4 dis 2011Chapter 4 dis 2011
Chapter 4 dis 2011
 
Engineering procedure
Engineering procedureEngineering procedure
Engineering procedure
 
An engineer 1+1=2
An engineer 1+1=2An engineer 1+1=2
An engineer 1+1=2
 
Least squares support Vector Machine Classifier
Least squares support Vector Machine ClassifierLeast squares support Vector Machine Classifier
Least squares support Vector Machine Classifier
 
11 ma1aexpandednoteswk3
11 ma1aexpandednoteswk311 ma1aexpandednoteswk3
11 ma1aexpandednoteswk3
 
Chapter 5
Chapter 5Chapter 5
Chapter 5
 
Md2521102111
Md2521102111Md2521102111
Md2521102111
 
Meanshift Tracking Presentation
Meanshift Tracking PresentationMeanshift Tracking Presentation
Meanshift Tracking Presentation
 
Taylor problem
Taylor problemTaylor problem
Taylor problem
 
Set convergence
Set convergenceSet convergence
Set convergence
 
SSA slides
SSA slidesSSA slides
SSA slides
 
Final
Final Final
Final
 

Plus de Vu Hung Nguyen

Basic & Advanced Scrum Framework
Basic & Advanced Scrum FrameworkBasic & Advanced Scrum Framework
Basic & Advanced Scrum Framework
Vu Hung Nguyen
 
Anti patterns in it project management
Anti patterns in it project managementAnti patterns in it project management
Anti patterns in it project management
Vu Hung Nguyen
 

Plus de Vu Hung Nguyen (20)

Co ban horenso - Tai lieu training noi bo
Co ban horenso - Tai lieu training noi boCo ban horenso - Tai lieu training noi bo
Co ban horenso - Tai lieu training noi bo
 
Funix techtalk: Tự học hiệu quả thời 4.0
Funix techtalk: Tự học hiệu quả thời 4.0Funix techtalk: Tự học hiệu quả thời 4.0
Funix techtalk: Tự học hiệu quả thời 4.0
 
Học cờ cùng con - Nguyễn Vỹ Kỳ Anh [U8]
Học cờ cùng con - Nguyễn Vỹ Kỳ Anh [U8]Học cờ cùng con - Nguyễn Vỹ Kỳ Anh [U8]
Học cờ cùng con - Nguyễn Vỹ Kỳ Anh [U8]
 
Japanese for it bridge engineers
Japanese for it bridge engineersJapanese for it bridge engineers
Japanese for it bridge engineers
 
Basic IT Project Management Terminologies
Basic IT Project Management TerminologiesBasic IT Project Management Terminologies
Basic IT Project Management Terminologies
 
2018 Học cờ cùng con - Nguyễn Vũ Kỳ Anh [U7]
2018 Học cờ cùng con - Nguyễn Vũ Kỳ Anh [U7]2018 Học cờ cùng con - Nguyễn Vũ Kỳ Anh [U7]
2018 Học cờ cùng con - Nguyễn Vũ Kỳ Anh [U7]
 
Làm việc hiệu quả với sếp Nhật (2017)
Làm việc hiệu quả với sếp Nhật (2017)Làm việc hiệu quả với sếp Nhật (2017)
Làm việc hiệu quả với sếp Nhật (2017)
 
Problem Solving Skills (for IT Engineers)
Problem Solving Skills (for IT Engineers)Problem Solving Skills (for IT Engineers)
Problem Solving Skills (for IT Engineers)
 
Using Shader in cocos2d-x
Using Shader in cocos2d-xUsing Shader in cocos2d-x
Using Shader in cocos2d-x
 
Pham Anh Tu - TK Framework
Pham Anh Tu - TK FrameworkPham Anh Tu - TK Framework
Pham Anh Tu - TK Framework
 
My idol: Magnus Carlsen vs. Ky Anh 2G1 NGS Newton
My idol: Magnus Carlsen vs. Ky Anh 2G1 NGS NewtonMy idol: Magnus Carlsen vs. Ky Anh 2G1 NGS Newton
My idol: Magnus Carlsen vs. Ky Anh 2G1 NGS Newton
 
Basic advanced scrum framework
Basic advanced scrum frameworkBasic advanced scrum framework
Basic advanced scrum framework
 
FPT Univ. Talkshow IT khong chi la lap trinh
FPT Univ. Talkshow IT khong chi la lap trinhFPT Univ. Talkshow IT khong chi la lap trinh
FPT Univ. Talkshow IT khong chi la lap trinh
 
Basic & Advanced Scrum Framework
Basic & Advanced Scrum FrameworkBasic & Advanced Scrum Framework
Basic & Advanced Scrum Framework
 
Agile Vietnam Conference 2016: Recap
Agile Vietnam Conference 2016: RecapAgile Vietnam Conference 2016: Recap
Agile Vietnam Conference 2016: Recap
 
IT Public Speaking Guidelines
IT Public Speaking GuidelinesIT Public Speaking Guidelines
IT Public Speaking Guidelines
 
Kanban: Cơ bản và Nâng cao
Kanban: Cơ bản và Nâng caoKanban: Cơ bản và Nâng cao
Kanban: Cơ bản và Nâng cao
 
Học cờ vua cùng con Nguyễn Vũ Kỳ Anh (U6)
Học cờ vua cùng con Nguyễn Vũ Kỳ Anh (U6)Học cờ vua cùng con Nguyễn Vũ Kỳ Anh (U6)
Học cờ vua cùng con Nguyễn Vũ Kỳ Anh (U6)
 
Fuji Technology Workshop: Learning Skills
Fuji Technology Workshop: Learning SkillsFuji Technology Workshop: Learning Skills
Fuji Technology Workshop: Learning Skills
 
Anti patterns in it project management
Anti patterns in it project managementAnti patterns in it project management
Anti patterns in it project management
 

Proof of Kraft-McMillan theorem

  • 1. Proof of Kraft-McMillan theorem∗ Nguyen Hung Vu† 平成 16 年 2 月 24 日 1 Kraft-McMillan theorem This means that Let C be a code with n codewords with lenghth l1 , l2 , ..., lN . If nl X nl X C is uniquely decodable, then K(C)n = Ak 2−k ≤ 2k 2−k = nl − n + 1 (3) k=n k=n N X K(C) = 2−li ≤ 1 (1) But K(C) is greater than one, it will grow exponentially with i=1 n, while n(l − 1) + 1 can only grow linearly, So if K(C) is greater than one, we can always find an n large enough that the inequal- ity (3) is violated. Therefore, for an uniquely decodable code C, K(C) is less than or equal ot one. 2 Proof This part of of Kraft-McMillan inequality provides a necces- The proof works by looking at the nth power of K(C). If K(C) sary condition for uniquely decodable codes. That is, if a code is greater than one, then K(C)n should grw expoentially with n. is uniquely decodable, the codeword lengths have to satisfy the If it does not grow expoentially with n, then this is proof that inequality. PN −li i=1 2 ≤ 1. Let n be and arbitrary integer. Then hP in N −li 3 Construction of prefix code i=1 2 = “P ” = N −li1 Given a set of integers l1 , l2 , · · · , lN that satisfy the inequality i1 =1 2 ··· “P ” “P ” N N −li2 N −lin X i2 =1 2 iN =1 2 2−li ≤ 1 (4) and then i=1 "N #n N N N we can always find a prefix code with codeword lengths X −li X X X l1 , l2 , · · · , lN . 2 = ··· 2−(li1 +i2 +···+in ) (2) i=1 i1 =1 i2 =1 in =1 The exponent li1 + li2 + · · · + lin is simply the length of n codewords from the code C. The smallest value this exponent 4 Proof of prefix code constructing can take is greater than or equal to n, which would be the case if codwords were 1 bit long. If theorem We will prove this assertion by developing a procedure for con- l = max{l1 , l2 , · · · , lN } structing a prefix code with codeword lengths l1 , l2 , · · · , lN that then the largest value that the exponent can take is less than or satisfy the given inequality. Without loss of generality, we can equal to nl. Therefore, we can write this summation as assume that l1 ≤ l2 ≤ · · · ≤ lN . (5) nl X Define a sequence of numbers w1 , w2 , · · · , wN as follows: K(C)n = Ak 2−k n w1 = 0 Pj−1 lj −li where Ak is the number of combinations of n codewords that wj = i=1 2 j > 1. have a combined length of k. Let’s take a look at the size of this The binary representation of wj for j > 1 would take up coeficient. The number of possibledistinct binary sequences of log2 wj bits. We will use this binary representation to construct length k is 2k . If this code is uniquely decodable, then each se- a prefix code. We first note that the number of bits in the binary quence can represent one and only one sequence of codewords. representation of wj is less than or equal to lj . This is obviously Therefore, the number of possible combinations of codewords true for w1 . For j > 1, whose combined length is k cannot be greater than 2k . In other hP i words, log2 wj = log2 j−1 lj −li 2 h i=1P i Ak ≤ 2k . = log2 2lj j−1 2−li i=1 hP i j−1 −li ∗ Introduction to Data Compression, Khalid Sayood = lj + log2 i=1 2 † email: vuhung@fedu.uec.ac.jp ≤ lj . 1
  • 2. The last inequality results from the hypothesis of the theo- P P i2 , · · · , Xn = in )P And second–order entropy is defined . rem that N 2−li ≤ 1, which implies that j−1 2li ≤ 1. i=1 i=1 by H2 (S) P − P (X1 ) log P (X1 ). In case of iid, = i1 =m As the logarithm of a˜number less than one is negative, lj + ˆP Gn = −n i1 =1 P (X1 = i1 ) log P (X1 = i1 ). Prove it! j−1 −li log2 i=1 2 has to be less than lj . 3. Given an alphabet A = {a1 , a2 , a3 , a4 }, find the first–order Using the binary representation of wj , we can devise a binary entropy in the following cases: code in the following manner: If log2 wj = lj , then the jth codeword cj is the binary representation of wj . If log2 wj < lj , (a) P (a1 ) = P (a2 ) = P (a3 ) = P (a4 ) = 1 . 4 then cj is the binary representation of wj , with lj − log2 wj (b) P (a1 ) = 1 , P (a2 ) = 1 , P (a3 ) = P (a4 ) = 1 . 2 4 8 zeros appended to the right. This certainly a code, but is it a prefix 1 1 code? If we can show that the code C = {c1 , c2 , · · · , cN } is a (c) P (a1 ) = 0.505, P (a2 ) = 4 , P (a3 ) = 8 , P (a4 ) = prefix, then we will have proved the theorem by construction. 0.12. Suppose that our claim is not true.. The for some j < k, cj is a 4. Suppose we have a source with a probability model P = refix of ck . This means that lj most significant bits of wk form the {p0 , p1 , · · · , pm } and entropy HP . Suppose we have an- binary representation of wj . Therefore if we right–shift the binary other sourcewith probability model Q = {q0 , q1 , · · · qm } of wk by lk − lj bits, we should get the binary representation for and entroy HQ , where wj . We can write this as qi = pi i = 0, 1, · · · , j − 2, j + 1, · · · , m wk wj = . 2lk −lj and However, pj +pj−1 qj = qj−1 = 2 k−1 X How is HQ related to HP ( greater, equal, or less)? Prove wk = 2lk −li . your answer. i=1 P P Hints: HQ − HP = pi log pi − qi log qi = Therefore, pj−1 +pj p +p pj−1 log pj−1 + pj log pj − 2 2 log j−1 j = 2 pj−1 +pj φ(pj−1 ) + φ(pj ) − 2φ( 2 ). Where φ(x) = x log x k−1 X and φ (x) = 1/x > 0 ∀x > 0. wk = 2lj −li 2lk −lj i=0 5. There are several image and speech files among the accom- k−1 panying data sets. X = wj + 2lj −li (a) Write a program to compute the first–order entropy of i=j (6) some of the image and speech files. k−1 X (b) Pick one of the image files and compute is second– = wj + 20 + 2lj −lk i=j+1 order entropy. Comment on the difference between the first–order and second–order entropies. ≥ wj + 1 (c) Compute the entropy of the difference between neigh- wk boring pixels for the image you used in part (b). Com- That is, the smalleset value for lk −lj is wj + 1. This constra- 2 dicts the requirement for cj being the prefix of ck . Therefore, cj ment on what you discovered. cannot be the prefix of ck . As j and k were arbitrary, this means 6. Conduct an experiemnt to see how well a model can describe that no codword is a prefix of another codeword, and the code C a source. is a prefix code. (a) Write a program that randomly selects letters from a 26–letter alphabet {a, b, c, · · · , z} and forms four– letter words. Form 100 such words and see how many 5 Projects and Problems of these words make sence. (b) Among the accompanying data sets is a file called 1. Suppose X is a random variable that takes on values from an 4letter.words , which contains a list of four– M–letter alphabet. Show that 0 ≤ H(X) ≤ log2 M . P letter words. Using this file, obtain a probability Hints: H(X) = − n P (αi ) log P (αi ) where, M = i=1 model for the alphabet. Now repeat part (a) gener- {α1 × m1 , α2 × m2 , · · · , αn × mP αi = αj ∀i = Pn n }, ating the words using probability model. To pick let- j, i=1 mi = M . H(X) = − n PMi log mi = Pn i=1 m M ters according to a probability model, construct the 1 1 − M i=1 mi (log mi − log M ) = − M mi log mi + cumulative density function (cdf )FX (x)2 . Using a 1 P 1 P log M mi = − M mi log mi + log M uniform pseudorandom number generator to generate M a value r, where 0 ≤ r < 1, pick the letter xk if 2. Show that for the case where the elements of an observed FX (xk − 1) ≤ r < FX (xk ). Compare your results sequence are idd 1 ,the entropy is equal to the first-order with those of part (a). entropy. (c) Repeat (b) using a single–letter context. Hints: First–order entropy is defined by: H1 (S) = 1 limn→∞ n Gn , where Gn = (d) Repeat (b) using a two–letter context. P P P − i1 =m i2 =m · · · in =1 in = mP (X1 i1 =1 i2 =1 = 7. Determine whether the following codes are uniquely decod- i1 , X2 = i2 , · · · , Xn = in ) log P (X1 = i1 , X2 = able: 1 iid: independent and identical distributed 2 cumulative distribution function (cdf ) FX (x) = P (X ≤ x)
  • 3. (a) {0, 01, 11, 111}( 01 11 = 0 111) In this alphabet, a3 and a4 are the two letters at the bottom of (b) {0, 01, 110, 111} (01 110 = 0 111 0) the sorted list. We assign their codewords as (c) {0, 10, 110, 111}, Yes. Prove it! c(a3 ) = α2 ∗ 0 (d) {1, 10, 110, 111}, (1 10 = 110) c(a4 ) = α2 ∗ 1 8. Using a text file compute the probabilities of each letter pi . but c(a4 ) = α1 . Therefore (a) Assume that we need a codeword of length log1 pi 2 to encode the letter i. Determine the number of bits needed to encode the file. α1 = α2 ∗ 1 (b) Compute the conditional probability P (i/j) of a let- which means that ter i given that the previous letter is j. Assume that 1 we need log P (i/j) to represent a letter i that fol- c(a4 ) = α2 ∗ 10 2 lows a letter j. Determine the number of bits needed c(a5 ) = α2 ∗ 11. to encode the file. At this stage, we again define a new alphabet A that consists of three letters a1 , a2 , a3 , where a3 is composed of a3 and a4 and has a probability P (a3 ) = P (a3 ) + P (a4 ) = 0.4. We sort this 6 Huffman coding new alphabet in descending order to obtain the following table. The reduced four–letter alphabet 6.1 Definition Letter Probability Codewords A minimal variable-length character encoding based on the fre- a2 0.4 c(a2 ) quency of each character. First, each character becomes a trivial a3 0.4 α2 tree, with the character as the only node. The character’s fre- a1 0.2 c(a1 ) quency is the tree’s frequency. The two trees with the least fre- quencies are joined with a new root which is assigned the sum In this case, the two least probable symbols are a1 and a3 . of their frequencies. This is repeated until all characters are in Therefore, one tree. One code bit represents each level. Thus more frequent characters are near the root and are encoded with few bits, and c(a3 ) = α3 ∗ 0 rare characters are far from the root and are encoded with many c(a1 ) = α3 ∗ 1 bits. But c(a3 ) = α2 . Therefore, 6.2 Example of Huffman encoding design α2 = α3 ∗ 0 which means that. Alphabet A = {a1 , a2 , a3 , a4 } with P (a1 ) = P (a2 ) = 0.2 and P (a4 ) = P (a5 ) = 0.1. The entropy of this source is c(a3 ) = α3 ∗ 00 2.122bits/symbol. To design the Huffman code, we first sort the c(a4 ) = α4 ∗ 010 letters in a descending probability order as shown in table below. c(a5 ) = α5 ∗ 011. Here c(ai ) denotes the codewords as Again we define a new alphabet, this time with only two let- c(a4 ) = α1 ∗ 0 ter a3 , a2 . Here a3 is composed of letters a3 and a1 and has c(a5 ) = α1 ∗ 1 probability P (a3 ) = P (a3 ) + P (a1 ) = 0.6. We now have the where α1 is a binary string, and * denotes concatnation. following table The initial five–letter alphabet Letter Probability Codeword The reduced four–letter alphabet a2 0.4 c(a2 ) Letter Probability Codewords a1 0.2 c(a1 ) a3 0.6 α3 a3 0.2 c(a3 ) a2 0.4 c(a2 ) a4 0.1 c(a4 ) a5 0.1 c(a5 ) As we have only two letters, the codeword assignment is straghtforward: Now we define a new alphabe A with a four–letter a1 , a2 , a3 , a4 where a4 is composed of a4 and a5 and has a proba- c(a3 ) = 0 bility P (a4 ) = P (a4 ) + P (a5 ) = 0.2. We sort this new alphabet c(a2 ) = 1 in descending order to obtain following table. which means that α3 = 0, which in turn means that The reduced four–letter alphabet Letter Probability Codewords c(a1 ) = 01 a2 0.4 c(a2 ) c(a3 ) = 000 a1 0.2 c(a1 ) c(a4 ) = 0010 a3 0.2 c(a3 ) c(a5 ) = 0011 a4 0.2 α1 and the Huffman finally is given here.
  • 4. The reduced four–letter alphabet 6.6 Extended Huffman code Letter Probability Codewords a2 0.4 1 Consider an alphabet A = {a1 , a2 , · · · , am }. Each element of a1 0.2 01 the sequence is generated independently of the other elements in a3 0.2 000 the sequence. The entropyo f this source is give by a4 0.1 0010 N X a5 0.1 0011 H(S) = − P (ai ) log2 P (ai ) i=1 We know that we can generate a Huffman code for this source 6.3 Huffman code is optimal with rate r such that The neccessary conditions for an optimal variable binary code H(S) ≤ R < H(S) + 1 (8) are as follows: We have used the looser bound here; the same argument can be made with the tighter bound. Notice thatwe have used ”rate 1. Condition 1: Give any two letter aj and ak , if P (aj ) ≥ R” to denote the number of bits per symbol. This is a standard P (ak ), then lj ≤ lk , where lj is the number of bits in the convention in the data compression literate. However, in the co- codeword for aj . munication literature, the word ”rate” ofter refers to the number 2. Condition 2: The two least probable letters have codewords of bits per second. with the same maximum length lm . Suppose we now encode the sequence by generating one code- word for every n symbols. As there are mn combinations of 3. Condition 3: In the tree corresponding to the optimum code, n symbols, we will need mn codewords in our Huffman code. there must be two branches stemming3 from each interme- We could generate this code by viewing mn symbols as let- diate node. ntimes z }| { 4. Condition 4: Suppose we change an intermediate node into ters of an extended alphabet. A(n) = {a1 a1 ...a1 , a1 a1 ...a2, a leaf node by combining all the leaves descending from ..., a2, a1 a1 , ...am , a1 a1 ...a2 a1, ..., am am ...am } from a source it into a composite word of a reduced alphabet. Then, if S (n) . Denote the rate for the new source as R(n) . We know the original tree was optimal for the original alphabet, the H(S (n) ) ≤ R(n) < H(S (n) ) + 1. (9) reduced tree is optimal for reduced alphabet. The Huffman code satisfies all conditions above and therefore 1 (n) R= R be optimal. n and H(S (n) ) H(S (n) ) 1 ≤R< + 6.4 Average codeword length n n n It can be proven that For a source S with alphabet A = {a1 , a2 , · · · , aK }, and prob- ability model {P (a1 ), P (a2 ), · · · , P (aK )}, the average code- H(S (n) ) = nH(S) word length is given by and therefore 1 K X H(S) ≤ R ≤ H(S) + ¯= n l P (ai )li . i=1 The difference between the entropy ofthe sourcce H(S) and the 6.6.1 Example of Huffman Extended Code average length is A = {a1 , a2 , a3 } with probabilities model P (a1 ) = 0.8, P (a2 ) = 0.02 and P (a3 ) = 0.18. The entropy for this P P H(S) − ¯ = l − K P (ai )h 2 P (ai ) i” K P (ai )li log − i=1 source is 0.816 bits per symbol. Huffman code this this source is P i=1 “ 1 shown below = P (ai ) log P (ai ) − li P “ h i ” = P (ai ) log P (ai ) − log2 (2li ) 1 a1 0 P h −l i a2 11 2 i = P (ai ) log P (ai ) ˆP l ˜ a3 10 ≤ log2 2i and the extended code is a1 a1 0.64 0 6.5 Length of Huffman code a1 a2 0.016 10101 a1 a3 0.144 11 It has been proven that a2 a1 0.016 101000 a2 a2 0.0004 10100101 H(S) ≤ ¯ < H(S) + 1 (l) (7) a2 a3 0.0036 1010011 a3 a1 0.1440 100 where H(S) is the entropy of the source S. a3 a2 0.0036 10100100 3 生じる a3 a3 0.0324 1011
  • 5. 6.7 Nonbinary Huffman codes 6.8 Adaptive Huffman Coding 7 Arithmetic coding 4 Use the mapping X(ai ) = i, ai ∈ A (10) where A = {a1 , a2 , ...am } is the alphabet for a discrete source and X is a random variable. This mapping means that given a probability model mathcalP for the source, we also have a prob- ability density function for the random variable P (X = i) = P (ai ) and the curmulative density function can be defined as i X FX (i) = P (X = k). k=1 ¯ Define TX (ai ) as i−1 X ¯ 1 TX (ai ) = P (X = k) + P (X = i) (11) 2 k=1 1 = FX (i − 1) + P (X = i) (12) 2 ¯ For each ai , TX (ai ) will have a unique value. This value can be used as a unique tag for ai . In general X 1 ¯(m) TX (xi ) = P (y) + P (xi ) (13) y<xi 2 4 Arithmetic: 算数, 算術