2. 2
Introduction
Compression is used to :
reduce the volume of information to be
stored into storages
reduce the communication bandwidth
required for its transmission over the
networks
4. 4
Compression Principles
Entropy Encoding
1. Run-length encoding
Lossless & Independent of the type of source
information
Used when the source information comprises
long substrings of the same character or
binary digit
(string or bit pattern, # of occurrences), as
FAX
e.g) 000000011111111110000011……
⇒ 0,7 1,10 0,5 1,2…… ⇒ 7,10,5,2……
5. 5
Entropy Encoding
2. Statistical encoding
Based on the probability of occurrence of a
pattern
The more probable, the shorter codeword
6. 6
Compression Principles
Huffman Encoding
Entropy, H: theoretical min. avg. # of bits that are required
to transmit a particular stream
H = -Σ i=1
n
Pi log2Pi
where n: # of symbols, Pi: probability of symbol i
Efficiency, E = H/H’
where, H’ = avr. # of bits per codeword = Σ i=1
n
Ni Pi
Ni: # of bits of symbol i
7. 7
E.g) symbols M(10), F(11), Y(010), N(011), 0(000),
1(001) with probabilities 0.25, 0.25, 0.125, 0.125,
0.125, 0.125
H’ = Σ i=1
6
Ni Pi = (2(2×0.25) + 4(3×0.125)) = 2.5
bits/codeword
H = -Σ i=1
6
Pi log2Pi = - (2(0.25log20.25) +
4(0.125log20.125)) = 2.5
E = H/H’ =100 %
3-bit/codeword if we use fixed-length codewords for six
symbols
8. 8
Huffman Algorithm (Variable-Length
Encoding)
Method Konstruksi pohon encoding
• Full Binary Tree Representation
• Each edge of the tree has a value,
(0 is the left child, 1 is the right child)
• Data is at the leaves, not internal nodes
• Result: encoding tree
9. 9
Huffman Algorithm
• 1. Maintain a forest of trees
• 2. Weight of tree = sum frequency of
leaves
• 3. For 0 to N-1
– Select two smallest weight trees
– Form a new tree
10. 10
• Huffman coding
• variable length code whose length is inversely
proportional to that character’s frequency
• must satisfy nonprefix property to be uniquely
decodable
• two pass algorithm
– first pass accumulates the character frequency
and generate codebook
– second pass does compression with the
codebook
11. 11
• create codes by constructing a binary tree
1. consider all characters as free nodes
2. assign two free nodes with lowest frequency to
a parent nodes with weights equal to sum of
their frequencies
3. remove the two free nodes and add the newly
created parent node to the list of free nodes
4. repeat step2 and 3 until there is one free node
left. It becomes the root of tree
Huffman coding
12. 12
• Right of binary tree :1
• Left of Binary tree :0
• Prefix (example)
– e:”01”, b: “010”
– “01” is prefix of “010” ==> “e0”
• same frequency : need consistency of
left or right
13. 13
Static Huffman Coding
Huffman (Code) Tree
Hitung jumlah symbols atau characters dan probabillitas relatif
prior
Must hold “prefix property” among codes
Symbol Occurrence
A 4/8
B 2/8
C 1/8
D 1/8
Symbol Code
A 1
B 01
C 001
D 000
4×1 + 2×2 + 1×3 +
1×3 = 14 bits are
required to transmit
“AAAABBCD”
0 1
D
A
B
C
0 1
0 18
4
2
Leaf node
Root node
Branch node
Prefix Property !
14. 14
• Contoh (Data dengan 64 karakter)
• R K K K K K K K
• K K K R R K K K
• K K R R R R G G
• K K B C C C R R
• G G G M C B R R
• B B B M Y B B R
• G G G G G G G R
• G R R R R G R R
15. 15
• Character frequency Huffman code
• =================================
• R 19 00
• K 17 01
• G 14 10
• B 7 110
• C 4 1110
• M 2 11110
• Y 1 11111
17. 17
Tujuan kompresi data adalah
untuk merepresentasikan suatu
data digital dengan sesedikit
mungkin bit.
Soal :
Tentukanlah kode masing-masing Karakter pada Text berikut dengan
menggunakan Huffman code