<- Go Back
This tool uses Huffman Coding to compress data!
- Huffman Coding is a popular technique used for Lossless Data Compression.
Lossles Compression:
- Lossless compression is a class of data compression algorithms that allows the original data to be
perfectly reconstructed from the compressed data. By contrast,lossy compression permits
reconstruction only of an approximation of the original data, though usually with greatly improved
compression rates (and therefore reduced media sizes).
- Lossles Compression is preffered for text file compression, while Lossy Compression is generally preferred
for audio, video and image files.
- File formats like ZIP use Lossless compression, while formats like MP3 and JPEG use
Lossy compression
- Data compression ratio , also known as compression power, is a measurement of the relative reduction
in size of data representation produced by a data compression algorithm. It is typically expressed as the
division of uncompressed size by compressed size. Thus, a representation that compresses a file's storage
size from 10 MB to 2 MB has a compression ratio of 10/2 = 5
-
Huffman Code:
-
A Huffman code is a particular type of optimal prefix code that is commonly used for lossless data
compression. The process of finding or using such a code proceeds by means of Huffman coding, an algorithm
developed by David A. Huffman.
-
The output from Huffman's algorithm can be viewed as a variable-length code table
for encoding a source symbol (such as a character in a file). The algorithm derives this table from the
estimated probability or frequency of occurrence (weight) for each possible value of the source symbol.
- Prefix Codes, means the codes (bit sequences) are assigned in such a way that the code assigned to
one character is not the prefix of code assigned to any other character. This is how Huffman Coding makes
sure that there is no ambiguity when decoding the generated bitstream.
Compression:
- There are mainly two major parts in Huffman Coding
- Build a Huffman Tree from input characters.
- Traverse the Huffman Tree and assign codes to characters.
-
Steps to build Huffman Tree
Input is an Array or Hashmap of unique characters along with their frequency of occurrences and
output is Huffman
Tree.
- Create a leaf node for each unique character and build a Min Heap (or a priority queue) of all leaf nodes
- Extract two nodes with the minimum frequency from the min heap.
- Create a new internal node with a frequency equal to the sum of the two nodes frequencies. Make the
first extracted node as its left child and the other extracted node as its right child. Add this
node to the min heap.
- Repeat steps 2 and 3 until the heap contains only one node. The remaining node is the root node and
the tree is complete.
-
Steps to print codes from Huffman Tree:
Traverse the tree formed starting from the root. Maintain an auxiliary array. While moving to the left
child, write 0 to the array. While moving to the right child, write 1 to the array. Print the array when a
leaf node is encountered.
De-compression:
- Generally speaking, the process of decompression is simply a matter of translating the stream of prefix codes to individual byte values, usually by traversing the Huffman tree node by node as each bit is read from the input stream (reaching a leaf node necessarily terminates the search for that particular byte value).
References: