Huffman codes

•Download as PPTX, PDF•

5 likes•6,770 views

Huffman codes are a technique for lossless data compression that assigns variable-length binary codes to characters, with more frequent characters having shorter codes. The algorithm builds a frequency table of characters then constructs a binary tree to determine optimal codes. Characters are assigned codes based on their path from the root, with left branches representing 0 and right 1. Both encoder and decoder use this tree to translate between binary codes and characters. The tree guarantees unique decoding and optimal compression.

Technology Business

Huffman Codes
 Huffman codes are an effective technique of
‘lossless data compression’ which means no
information is lost.
 The algorithm builds a table of the
frequencies of each character in a file
 The table is then used to determine an optimal
way of representing each character as a binary
string

Huffman Codes
 Consider a file of 100,000 characters from a-
f, with these frequencies:
 a = 45,000
 b = 13,000
 c = 12,000
 d = 16,000
 e = 9,000
 f = 5,000

Huffman Codes
 Typically each character in a file is stored as a
single byte (8 bits)
 If we know we only have six characters, we can use a 3 bit
code for the characters instead:
 a = 000, b = 001, c = 010, d = 011, e = 100, f = 101
 This is called a fixed-length code
 With this scheme, we can encode the whole file with 300,000
bits (45000*3+13000*3+12000*3+16000*3+9000*3+5000*3)
 We can do better
 Better compression
 More flexibility

Huffman Codes
 Variable length codes can perform significantly
better
 Frequent characters are given short code words, while
infrequent characters get longer code words
 Consider this scheme:
 a = 0; b = 101; c = 100; d = 111; e = 1101; f = 1100
 How many bits are now required to encode our file?
 45,000*1 + 13,000*3 + 12,000*3 + 16,000*3 + 9,000*4 + 5,000*4
= 224,000 bits
 This is in fact an optimal character code for this file

Huffman Codes
 Prefix codes
 Huffman codes are constructed in such a way that they can
be unambiguously translated back to the original data, yet
still be an optimal character code
 Huffman codes are really considered “prefix codes”
 No code word is a prefix of any other code word
Prefix code (9,55,50)
Not a prefix code (9,5,55,59) because 5 is present in 55 and 59
 This guarantees unambiguous decoding
 Once a code is recognized, we can replace with the decoded
data, without worrying about whether we may also match some
other code

Huffman Codes
 Both the encoder and decoder make use of a binary
tree to recognize codes
 The leaves of the tree represent the unencoded characters
 Each left branch indicates a “0” placed in the encoded bit
string
 Each right branch indicates a “1” placed in the bit string

Huffman Codes
100
a:45
0
55
1
25
0
c:12
0
b:13
1
30
1
14
0
d:16
1
f:5 e:9
0 1
A Huffman Code Tree
 To encode:
 Search the tree for the character
to encode
 As you progress, add “0” or “1”
to right of code
 Code is complete when you find
character
 To decode a code:
 Proceed through bit string left to right
 For each bit, proceed left or right as
indicated
 When you reach a leaf, that is the
decoded character

Huffman Codes
 Using this representation, an optimal code will
always be represented by a full binary tree
 Every non-leaf node has two children
 If this were not true, then there would be “waste” bits, as in the
fixed-length code, leading to a non-optimal compression
 For a set of c characters, this requires c leaves, and c-1
internal nodes

Huffman Codes
 Given a Huffman tree, how do we compute the
number of bits required to encode a file?
 For every character c:
 Let f(c) denote the character’s frequency
 Let dT(c) denote the character’s depth in the tree
 This is also the length of the character’s code word
 The total bits required is then:



Cc
T cdcfTB )()()(

Constructing a Huffman Code
 Huffman developed a greedy algorithm for
constructing an optimal prefix code
 The algorithm builds the tree in a bottom-up manner
 It begins with the leaves, then performs merging operations
to build up the tree
 At each step, it merges the two least frequent members
together
 It removes these characters from the set, and replaces them
with a “metacharacter” with frequency = sum of the removed
characters’ frequencies

 For example, we code the 3 letter file ‘abc’ as
000.001.010

What's hot

Block coding, error detection (Parity checking, Cyclic redundancy checking (C...Paulo_Vangui

Huffman CodingEhtisham Ali

Chapter 4 data link layerNaiyan Noor

Specification-of-tokensDattatray Gandhmal

Data RedundacyPoonam Seth

Chapter 9 morphological image processingasodariyabhavesh

Relationship Among Token, Lexeme & PatternBharat Rathore

4. block coding MdFazleRabbi18

0/1 knapsackAmin Omi

Symbol table in compiler DesignKuppusamy P

Arithmetic codingVikas Goyal

16. Concurrency Control in DBMSkoolkampus

Physical Layer Numericals - Data Communication & NetworkingDrishti Bhalla

Linked listakshat360

Hashing in datastructurerajshreemuthiah

Multi Head, Multi Tape Turing MachineRadhakrishnan Chinnusamy

Color Image Processingkiruthiammu

Chapter 9 morphological image processingAhmed Daoud

image compression pptShivangi Saxena

Lexical analysis - Compiler DesignMuhammed Afsal Villan

What's hot (20)

Block coding, error detection (Parity checking, Cyclic redundancy checking (C...

Huffman Coding

Chapter 4 data link layer

Specification-of-tokens

Data Redundacy

Chapter 9 morphological image processing

Relationship Among Token, Lexeme & Pattern

4. block coding

0/1 knapsack

Symbol table in compiler Design

Arithmetic coding

16. Concurrency Control in DBMS

Physical Layer Numericals - Data Communication & Networking

Linked list

Hashing in datastructure

Multi Head, Multi Tape Turing Machine

Color Image Processing

Chapter 9 morphological image processing

image compression ppt

Lexical analysis - Compiler Design

Similar to Huffman codes

Huffman analysisAbubakar Sultan

Huffman coding01Nv Thejaswini

Huffman codingGeorge Ang

HuffmanCoding01.docQwertty3

Huffmankeerthi vasan

Losslessanithabalaprabhu

LosslessVishal Suri

Implementation of Lossless Compression Algorithms for Text DataBRNSSPublicationHubI

12_HuffmanhsjsjsjjsiejjssjjejsjCoding_pdf.pdfSHIVAM691605

Huffman Algorithm By Shuhinsuhin4000

Lecft3dataali Hussien

Komdat-Kompresi Datamursalinfajri007

Chapter 4 Lossless Compression Algorithims.pptxMedinaBedru

Chapter%202%20 %20 Text%20compression(2)nes

Arithmetic Codinganithabalaprabhu

INSTRUCTIONS For this assignment you will be generating all code on y.pdfadayarboot

Lec5 Compressionanithabalaprabhu

Compression Iianithabalaprabhu

Similar to Huffman codes (20)

Huffman analysis

Huffman coding01

Huffman coding

HuffmanCoding01.doc

Huffman

Lossless

Implementation of Lossless Compression Algorithms for Text Data

12_HuffmanhsjsjsjjsiejjssjjejsjCoding_pdf.pdf

Huffman Algorithm By Shuhin

Lecft3data

Komdat-Kompresi Data

Chapter 4 Lossless Compression Algorithims.pptx

Chapter%202%20 %20 Text%20compression(2)

Arithmetic Coding

INSTRUCTIONS For this assignment you will be generating all code on y.pdf

Lec5 Compression

Compression Ii

Recently uploaded

FWD Group - Insurer Innovation Award 2024The Digital Insurer

ICT role in 21st century education and its challengesrafiqahmad00786416

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays

Understanding the FAA Part 107 License ..Christopher Logan Kennedy

Platformless Horizons for Digital AdaptabilityWSO2

MS Copilot expands with MS Graph connectorsNanddeep Nachan

Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya

CNIC Information System with Pakdata Cf In Pakistandanishmna97

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software

Architecting Cloud Native ApplicationsWSO2

[BuildWithAI] Introduction to Gemini.pdfSandro Moreira

Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood

Recently uploaded (20)

FWD Group - Insurer Innovation Award 2024

ICT role in 21st century education and its challenges

Boost Fertility New Invention Ups Success Rates.pdf

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...

Understanding the FAA Part 107 License ..

Platformless Horizons for Digital Adaptability

MS Copilot expands with MS Graph connectors

Artificial Intelligence Chap.5 : Uncertainty

CNIC Information System with Pakdata Cf In Pakistan

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Architecting Cloud Native Applications

[BuildWithAI] Introduction to Gemini.pdf

Introduction to Multilingual Retrieval Augmented Generation (RAG)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Huffman codes

1. Huffman Codes  Huffman codes are an effective technique of ‘lossless data compression’ which means no information is lost.  The algorithm builds a table of the frequencies of each character in a file  The table is then used to determine an optimal way of representing each character as a binary string

2. Huffman Codes  Consider a file of 100,000 characters from a- f, with these frequencies:  a = 45,000  b = 13,000  c = 12,000  d = 16,000  e = 9,000  f = 5,000

3. Huffman Codes  Typically each character in a file is stored as a single byte (8 bits)  If we know we only have six characters, we can use a 3 bit code for the characters instead:  a = 000, b = 001, c = 010, d = 011, e = 100, f = 101  This is called a fixed-length code  With this scheme, we can encode the whole file with 300,000 bits (45000*3+13000*3+12000*3+16000*3+9000*3+5000*3)  We can do better  Better compression  More flexibility

4. Huffman Codes  Variable length codes can perform significantly better  Frequent characters are given short code words, while infrequent characters get longer code words  Consider this scheme:  a = 0; b = 101; c = 100; d = 111; e = 1101; f = 1100  How many bits are now required to encode our file?  45,000*1 + 13,000*3 + 12,000*3 + 16,000*3 + 9,000*4 + 5,000*4 = 224,000 bits  This is in fact an optimal character code for this file

5. Huffman Codes  Prefix codes  Huffman codes are constructed in such a way that they can be unambiguously translated back to the original data, yet still be an optimal character code  Huffman codes are really considered “prefix codes”  No code word is a prefix of any other code word Prefix code (9,55,50) Not a prefix code (9,5,55,59) because 5 is present in 55 and 59  This guarantees unambiguous decoding  Once a code is recognized, we can replace with the decoded data, without worrying about whether we may also match some other code

6. Huffman Codes  Both the encoder and decoder make use of a binary tree to recognize codes  The leaves of the tree represent the unencoded characters  Each left branch indicates a “0” placed in the encoded bit string  Each right branch indicates a “1” placed in the bit string

7. Huffman Codes 100 a:45 0 55 1 25 0 c:12 0 b:13 1 30 1 14 0 d:16 1 f:5 e:9 0 1 A Huffman Code Tree  To encode:  Search the tree for the character to encode  As you progress, add “0” or “1” to right of code  Code is complete when you find character  To decode a code:  Proceed through bit string left to right  For each bit, proceed left or right as indicated  When you reach a leaf, that is the decoded character

8. Huffman Codes  Using this representation, an optimal code will always be represented by a full binary tree  Every non-leaf node has two children  If this were not true, then there would be “waste” bits, as in the fixed-length code, leading to a non-optimal compression  For a set of c characters, this requires c leaves, and c-1 internal nodes

9. Huffman Codes  Given a Huffman tree, how do we compute the number of bits required to encode a file?  For every character c:  Let f(c) denote the character’s frequency  Let dT(c) denote the character’s depth in the tree  This is also the length of the character’s code word  The total bits required is then:    Cc T cdcfTB )()()(

10. Constructing a Huffman Code  Huffman developed a greedy algorithm for constructing an optimal prefix code  The algorithm builds the tree in a bottom-up manner  It begins with the leaves, then performs merging operations to build up the tree  At each step, it merges the two least frequent members together  It removes these characters from the set, and replaces them with a “metacharacter” with frequency = sum of the removed characters’ frequencies

11. Example

12.

13.

14.

15.

16.  For example, we code the 3 letter file ‘abc’ as 000.001.010

Huffman codes

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Huffman codes

Similar to Huffman codes (20)

More from Nargis Ehsan

More from Nargis Ehsan (11)

Recently uploaded

Recently uploaded (20)

Huffman codes