SlideShare a Scribd company logo
1 of 16
Huffman Codes
 Huffman codes are an effective technique of
‘lossless data compression’ which means no
information is lost.
 The algorithm builds a table of the
frequencies of each character in a file
 The table is then used to determine an optimal
way of representing each character as a binary
string
Huffman Codes
 Consider a file of 100,000 characters from a-
f, with these frequencies:
 a = 45,000
 b = 13,000
 c = 12,000
 d = 16,000
 e = 9,000
 f = 5,000
Huffman Codes
 Typically each character in a file is stored as a
single byte (8 bits)
 If we know we only have six characters, we can use a 3 bit
code for the characters instead:
 a = 000, b = 001, c = 010, d = 011, e = 100, f = 101
 This is called a fixed-length code
 With this scheme, we can encode the whole file with 300,000
bits (45000*3+13000*3+12000*3+16000*3+9000*3+5000*3)
 We can do better
 Better compression
 More flexibility
Huffman Codes
 Variable length codes can perform significantly
better
 Frequent characters are given short code words, while
infrequent characters get longer code words
 Consider this scheme:
 a = 0; b = 101; c = 100; d = 111; e = 1101; f = 1100
 How many bits are now required to encode our file?
 45,000*1 + 13,000*3 + 12,000*3 + 16,000*3 + 9,000*4 + 5,000*4
= 224,000 bits
 This is in fact an optimal character code for this file
Huffman Codes
 Prefix codes
 Huffman codes are constructed in such a way that they can
be unambiguously translated back to the original data, yet
still be an optimal character code
 Huffman codes are really considered “prefix codes”
 No code word is a prefix of any other code word
Prefix code (9,55,50)
Not a prefix code (9,5,55,59) because 5 is present in 55 and 59
 This guarantees unambiguous decoding
 Once a code is recognized, we can replace with the decoded
data, without worrying about whether we may also match some
other code
Huffman Codes
 Both the encoder and decoder make use of a binary
tree to recognize codes
 The leaves of the tree represent the unencoded characters
 Each left branch indicates a “0” placed in the encoded bit
string
 Each right branch indicates a “1” placed in the bit string
Huffman Codes
100
a:45
0
55
1
25
0
c:12
0
b:13
1
30
1
14
0
d:16
1
f:5 e:9
0 1
A Huffman Code Tree
 To encode:
 Search the tree for the character
to encode
 As you progress, add “0” or “1”
to right of code
 Code is complete when you find
character
 To decode a code:
 Proceed through bit string left to right
 For each bit, proceed left or right as
indicated
 When you reach a leaf, that is the
decoded character
Huffman Codes
 Using this representation, an optimal code will
always be represented by a full binary tree
 Every non-leaf node has two children
 If this were not true, then there would be “waste” bits, as in the
fixed-length code, leading to a non-optimal compression
 For a set of c characters, this requires c leaves, and c-1
internal nodes
Huffman Codes
 Given a Huffman tree, how do we compute the
number of bits required to encode a file?
 For every character c:
 Let f(c) denote the character’s frequency
 Let dT(c) denote the character’s depth in the tree
 This is also the length of the character’s code word
 The total bits required is then:



Cc
T cdcfTB )()()(
Constructing a Huffman Code
 Huffman developed a greedy algorithm for
constructing an optimal prefix code
 The algorithm builds the tree in a bottom-up manner
 It begins with the leaves, then performs merging operations
to build up the tree
 At each step, it merges the two least frequent members
together
 It removes these characters from the set, and replaces them
with a “metacharacter” with frequency = sum of the removed
characters’ frequencies
Example
 For example, we code the 3 letter file ‘abc’ as
000.001.010

More Related Content

What's hot

Block coding, error detection (Parity checking, Cyclic redundancy checking (C...
Block coding, error detection (Parity checking, Cyclic redundancy checking (C...Block coding, error detection (Parity checking, Cyclic redundancy checking (C...
Block coding, error detection (Parity checking, Cyclic redundancy checking (C...Paulo_Vangui
 
Chapter 4 data link layer
Chapter 4 data link layerChapter 4 data link layer
Chapter 4 data link layerNaiyan Noor
 
Chapter 9 morphological image processing
Chapter 9 morphological image processingChapter 9 morphological image processing
Chapter 9 morphological image processingasodariyabhavesh
 
Relationship Among Token, Lexeme & Pattern
Relationship Among Token, Lexeme & PatternRelationship Among Token, Lexeme & Pattern
Relationship Among Token, Lexeme & PatternBharat Rathore
 
0/1 knapsack
0/1 knapsack0/1 knapsack
0/1 knapsackAmin Omi
 
Symbol table in compiler Design
Symbol table in compiler DesignSymbol table in compiler Design
Symbol table in compiler DesignKuppusamy P
 
Arithmetic coding
Arithmetic codingArithmetic coding
Arithmetic codingVikas Goyal
 
16. Concurrency Control in DBMS
16. Concurrency Control in DBMS16. Concurrency Control in DBMS
16. Concurrency Control in DBMSkoolkampus
 
Physical Layer Numericals - Data Communication & Networking
Physical Layer  Numericals - Data Communication & NetworkingPhysical Layer  Numericals - Data Communication & Networking
Physical Layer Numericals - Data Communication & NetworkingDrishti Bhalla
 
Hashing in datastructure
Hashing in datastructureHashing in datastructure
Hashing in datastructurerajshreemuthiah
 
Color Image Processing
Color Image ProcessingColor Image Processing
Color Image Processingkiruthiammu
 
Chapter 9 morphological image processing
Chapter 9   morphological image processingChapter 9   morphological image processing
Chapter 9 morphological image processingAhmed Daoud
 

What's hot (20)

Block coding, error detection (Parity checking, Cyclic redundancy checking (C...
Block coding, error detection (Parity checking, Cyclic redundancy checking (C...Block coding, error detection (Parity checking, Cyclic redundancy checking (C...
Block coding, error detection (Parity checking, Cyclic redundancy checking (C...
 
Huffman Coding
Huffman CodingHuffman Coding
Huffman Coding
 
Chapter 4 data link layer
Chapter 4 data link layerChapter 4 data link layer
Chapter 4 data link layer
 
Specification-of-tokens
Specification-of-tokensSpecification-of-tokens
Specification-of-tokens
 
Data Redundacy
Data RedundacyData Redundacy
Data Redundacy
 
Chapter 9 morphological image processing
Chapter 9 morphological image processingChapter 9 morphological image processing
Chapter 9 morphological image processing
 
Relationship Among Token, Lexeme & Pattern
Relationship Among Token, Lexeme & PatternRelationship Among Token, Lexeme & Pattern
Relationship Among Token, Lexeme & Pattern
 
4. block coding
4. block coding 4. block coding
4. block coding
 
0/1 knapsack
0/1 knapsack0/1 knapsack
0/1 knapsack
 
Symbol table in compiler Design
Symbol table in compiler DesignSymbol table in compiler Design
Symbol table in compiler Design
 
Arithmetic coding
Arithmetic codingArithmetic coding
Arithmetic coding
 
16. Concurrency Control in DBMS
16. Concurrency Control in DBMS16. Concurrency Control in DBMS
16. Concurrency Control in DBMS
 
Physical Layer Numericals - Data Communication & Networking
Physical Layer  Numericals - Data Communication & NetworkingPhysical Layer  Numericals - Data Communication & Networking
Physical Layer Numericals - Data Communication & Networking
 
Linked list
Linked listLinked list
Linked list
 
Hashing in datastructure
Hashing in datastructureHashing in datastructure
Hashing in datastructure
 
Multi Head, Multi Tape Turing Machine
Multi Head, Multi Tape Turing MachineMulti Head, Multi Tape Turing Machine
Multi Head, Multi Tape Turing Machine
 
Color Image Processing
Color Image ProcessingColor Image Processing
Color Image Processing
 
Chapter 9 morphological image processing
Chapter 9   morphological image processingChapter 9   morphological image processing
Chapter 9 morphological image processing
 
image compression ppt
image compression pptimage compression ppt
image compression ppt
 
Lexical analysis - Compiler Design
Lexical analysis - Compiler DesignLexical analysis - Compiler Design
Lexical analysis - Compiler Design
 

Similar to Huffman codes

Huffman coding
Huffman codingHuffman coding
Huffman codingGeorge Ang
 
HuffmanCoding01.doc
HuffmanCoding01.docHuffmanCoding01.doc
HuffmanCoding01.docQwertty3
 
Implementation of Lossless Compression Algorithms for Text Data
Implementation of Lossless Compression Algorithms for Text DataImplementation of Lossless Compression Algorithms for Text Data
Implementation of Lossless Compression Algorithms for Text DataBRNSSPublicationHubI
 
12_HuffmanhsjsjsjjsiejjssjjejsjCoding_pdf.pdf
12_HuffmanhsjsjsjjsiejjssjjejsjCoding_pdf.pdf12_HuffmanhsjsjsjjsiejjssjjejsjCoding_pdf.pdf
12_HuffmanhsjsjsjjsiejjssjjejsjCoding_pdf.pdfSHIVAM691605
 
Huffman Algorithm By Shuhin
Huffman Algorithm By ShuhinHuffman Algorithm By Shuhin
Huffman Algorithm By Shuhinsuhin4000
 
Chapter 4 Lossless Compression Algorithims.pptx
Chapter 4 Lossless Compression Algorithims.pptxChapter 4 Lossless Compression Algorithims.pptx
Chapter 4 Lossless Compression Algorithims.pptxMedinaBedru
 
Chapter%202%20 %20 Text%20compression(2)
Chapter%202%20 %20 Text%20compression(2)Chapter%202%20 %20 Text%20compression(2)
Chapter%202%20 %20 Text%20compression(2)nes
 
INSTRUCTIONS For this assignment you will be generating all code on y.pdf
 INSTRUCTIONS For this assignment you will be generating all code on y.pdf INSTRUCTIONS For this assignment you will be generating all code on y.pdf
INSTRUCTIONS For this assignment you will be generating all code on y.pdfadayarboot
 

Similar to Huffman codes (20)

Huffman analysis
Huffman analysisHuffman analysis
Huffman analysis
 
Huffman coding01
Huffman coding01Huffman coding01
Huffman coding01
 
Huffman coding
Huffman codingHuffman coding
Huffman coding
 
HuffmanCoding01.doc
HuffmanCoding01.docHuffmanCoding01.doc
HuffmanCoding01.doc
 
Huffman
HuffmanHuffman
Huffman
 
Huffman
HuffmanHuffman
Huffman
 
Lossless
LosslessLossless
Lossless
 
Lossless
LosslessLossless
Lossless
 
Implementation of Lossless Compression Algorithms for Text Data
Implementation of Lossless Compression Algorithms for Text DataImplementation of Lossless Compression Algorithms for Text Data
Implementation of Lossless Compression Algorithms for Text Data
 
12_HuffmanhsjsjsjjsiejjssjjejsjCoding_pdf.pdf
12_HuffmanhsjsjsjjsiejjssjjejsjCoding_pdf.pdf12_HuffmanhsjsjsjjsiejjssjjejsjCoding_pdf.pdf
12_HuffmanhsjsjsjjsiejjssjjejsjCoding_pdf.pdf
 
Huffman Algorithm By Shuhin
Huffman Algorithm By ShuhinHuffman Algorithm By Shuhin
Huffman Algorithm By Shuhin
 
Lecft3data
Lecft3dataLecft3data
Lecft3data
 
Komdat-Kompresi Data
Komdat-Kompresi DataKomdat-Kompresi Data
Komdat-Kompresi Data
 
Chapter 4 Lossless Compression Algorithims.pptx
Chapter 4 Lossless Compression Algorithims.pptxChapter 4 Lossless Compression Algorithims.pptx
Chapter 4 Lossless Compression Algorithims.pptx
 
Chapter%202%20 %20 Text%20compression(2)
Chapter%202%20 %20 Text%20compression(2)Chapter%202%20 %20 Text%20compression(2)
Chapter%202%20 %20 Text%20compression(2)
 
Arithmetic Coding
Arithmetic CodingArithmetic Coding
Arithmetic Coding
 
INSTRUCTIONS For this assignment you will be generating all code on y.pdf
 INSTRUCTIONS For this assignment you will be generating all code on y.pdf INSTRUCTIONS For this assignment you will be generating all code on y.pdf
INSTRUCTIONS For this assignment you will be generating all code on y.pdf
 
Lec5 Compression
Lec5 CompressionLec5 Compression
Lec5 Compression
 
Compression Ii
Compression IiCompression Ii
Compression Ii
 
Compression Ii
Compression IiCompression Ii
Compression Ii
 

More from Nargis Ehsan

Sqlite left outer_joins
Sqlite left outer_joinsSqlite left outer_joins
Sqlite left outer_joinsNargis Ehsan
 
Sql statments c ha p# 1
Sql statments c ha p# 1Sql statments c ha p# 1
Sql statments c ha p# 1Nargis Ehsan
 
Quick sort algo analysis
Quick sort algo analysisQuick sort algo analysis
Quick sort algo analysisNargis Ehsan
 
Inner join and outer join
Inner join and outer joinInner join and outer join
Inner join and outer joinNargis Ehsan
 
The relational database model chapter 2
The relational database model  chapter 2The relational database model  chapter 2
The relational database model chapter 2Nargis Ehsan
 
Communication network
Communication networkCommunication network
Communication networkNargis Ehsan
 
Communication network .ppt
Communication network  .pptCommunication network  .ppt
Communication network .pptNargis Ehsan
 

More from Nargis Ehsan (11)

Sqlite left outer_joins
Sqlite left outer_joinsSqlite left outer_joins
Sqlite left outer_joins
 
Sql5
Sql5Sql5
Sql5
 
Sql2
Sql2Sql2
Sql2
 
Sql statments c ha p# 1
Sql statments c ha p# 1Sql statments c ha p# 1
Sql statments c ha p# 1
 
Sql 3
Sql 3Sql 3
Sql 3
 
Quick sort algo analysis
Quick sort algo analysisQuick sort algo analysis
Quick sort algo analysis
 
Inner join and outer join
Inner join and outer joinInner join and outer join
Inner join and outer join
 
Erd chapter 3
Erd chapter 3Erd chapter 3
Erd chapter 3
 
The relational database model chapter 2
The relational database model  chapter 2The relational database model  chapter 2
The relational database model chapter 2
 
Communication network
Communication networkCommunication network
Communication network
 
Communication network .ppt
Communication network  .pptCommunication network  .ppt
Communication network .ppt
 

Recently uploaded

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 

Recently uploaded (20)

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

Huffman codes

  • 1. Huffman Codes  Huffman codes are an effective technique of ‘lossless data compression’ which means no information is lost.  The algorithm builds a table of the frequencies of each character in a file  The table is then used to determine an optimal way of representing each character as a binary string
  • 2. Huffman Codes  Consider a file of 100,000 characters from a- f, with these frequencies:  a = 45,000  b = 13,000  c = 12,000  d = 16,000  e = 9,000  f = 5,000
  • 3. Huffman Codes  Typically each character in a file is stored as a single byte (8 bits)  If we know we only have six characters, we can use a 3 bit code for the characters instead:  a = 000, b = 001, c = 010, d = 011, e = 100, f = 101  This is called a fixed-length code  With this scheme, we can encode the whole file with 300,000 bits (45000*3+13000*3+12000*3+16000*3+9000*3+5000*3)  We can do better  Better compression  More flexibility
  • 4. Huffman Codes  Variable length codes can perform significantly better  Frequent characters are given short code words, while infrequent characters get longer code words  Consider this scheme:  a = 0; b = 101; c = 100; d = 111; e = 1101; f = 1100  How many bits are now required to encode our file?  45,000*1 + 13,000*3 + 12,000*3 + 16,000*3 + 9,000*4 + 5,000*4 = 224,000 bits  This is in fact an optimal character code for this file
  • 5. Huffman Codes  Prefix codes  Huffman codes are constructed in such a way that they can be unambiguously translated back to the original data, yet still be an optimal character code  Huffman codes are really considered “prefix codes”  No code word is a prefix of any other code word Prefix code (9,55,50) Not a prefix code (9,5,55,59) because 5 is present in 55 and 59  This guarantees unambiguous decoding  Once a code is recognized, we can replace with the decoded data, without worrying about whether we may also match some other code
  • 6. Huffman Codes  Both the encoder and decoder make use of a binary tree to recognize codes  The leaves of the tree represent the unencoded characters  Each left branch indicates a “0” placed in the encoded bit string  Each right branch indicates a “1” placed in the bit string
  • 7. Huffman Codes 100 a:45 0 55 1 25 0 c:12 0 b:13 1 30 1 14 0 d:16 1 f:5 e:9 0 1 A Huffman Code Tree  To encode:  Search the tree for the character to encode  As you progress, add “0” or “1” to right of code  Code is complete when you find character  To decode a code:  Proceed through bit string left to right  For each bit, proceed left or right as indicated  When you reach a leaf, that is the decoded character
  • 8. Huffman Codes  Using this representation, an optimal code will always be represented by a full binary tree  Every non-leaf node has two children  If this were not true, then there would be “waste” bits, as in the fixed-length code, leading to a non-optimal compression  For a set of c characters, this requires c leaves, and c-1 internal nodes
  • 9. Huffman Codes  Given a Huffman tree, how do we compute the number of bits required to encode a file?  For every character c:  Let f(c) denote the character’s frequency  Let dT(c) denote the character’s depth in the tree  This is also the length of the character’s code word  The total bits required is then:    Cc T cdcfTB )()()(
  • 10. Constructing a Huffman Code  Huffman developed a greedy algorithm for constructing an optimal prefix code  The algorithm builds the tree in a bottom-up manner  It begins with the leaves, then performs merging operations to build up the tree  At each step, it merges the two least frequent members together  It removes these characters from the set, and replaces them with a “metacharacter” with frequency = sum of the removed characters’ frequencies
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.  For example, we code the 3 letter file ‘abc’ as 000.001.010