Compression Algorithms Research Paper Example | Topics and Well Written Essays

? Compression Algorithms: Introduction In information theory and computer technology, source coding, reduction in the bit rate or data compression involves encoding of information by the use of fewer bits compared to original representations. The compression can be lossless or lossy. Lossless compression lessens bits through identification and elimination of statistical redundancy. There is no information that is lost in the lossless compression. In contrary, lossy compression lessens bits through identification of marginally vital information and eliminates it. This process of size reduction of data is popularly known as compression of data, though it was formally known as source coding. Compression is important as it aids in cutting down the use of resources, like space of data storage or capacity of transmission. As compressed data should be decompressed in order to use, the extra processing computation or costs that arise from decompression, the situation differs far from free lunch. Algorithm compression is likely to be subjected to a trade off of time space complexity. For example, a video compression scheme needs a costly hardware to decompress the video with speed for it to be observed during the decompressing process. Opting for decompression of the video before watching may be of inconvenience or may need additional storage. Data compression design schemes entail tradeoffs amid various factors, inclusive of compression degree, distortion introduced and required computational resources to uncompress and compress the data. There are new options for traditional systems that sample fully then compress providing effective usage of resource based on compressed sensing principles. Compressed sensing methods circumvent the requirement for compression of data choosing from a selected basis. Origin The compression is either lossless or lossy. The lossless compression lessens bits via identification and removal of statistical redundancy. There is no information that is lost in the lossless compression. In contrary, lossy compression lessens bits through identification of marginally vital information and eliminates it. This process of size reduction of data is popularly known as compression of data, though it was formally known as source coding. Compression is important as it aids in cutting down the use of resources, like space of data storage or capacity of transmission. Algorithm compression has played an important role in IT from the 1970s. During this time, internet was growing in its popularity and there was invention of Lempel-Ziv algorithms. The Lempel-Ziv algorithm unfortunately, has a stretched history in non-computing. The earliest invention of compression algorithms is the Morse code that took place in 1883. It involves the a compression of data entailing common letters found in English like t and e which are allocated Morse codes that are shorter. Later, when mainframe computers started taking hold in the year 1949, Robert Fano and Claude Shannon invented coding that was named Shannon-Fan. Their algorithm allocates codes to cipher in a specific data blocks based on likelihood of occurrence of the symbol. The probability being of one symbol occurring is indirectly proportional to the code length which results to a shorter means of representing data (Wolfram, 2002) After two years, David Huffman as he studied information theory shared a class with Fano Robert. Fano issued the class with the option of either taking final exam or writing a research paper. Huffman made for the research paper that was on the topic of working out on the most effective binary coding method. After a research carried out for months that proved not to be fruitful, Huffman almost gave up on the work to study for a final exam to cover for the paper. At that point is when Huffman got an epiphany, building a technique that was more efficient yet similar to the coding of Shannon-Fano. The major difference between Huffman and Shannon-Fano is in the later is there is a bottom-up built probability tree while in Huffman the tree is top-down. Earlier completion of Huffman and Shannon-Fano were made using hardcoded and hardware codes. Not until in the 1970s and dawn of internet and internet storage that the compression of software was completed that codes Huffman were energetically developed on input data. In the later months of 1977, Jacob Ziv and Abraham Lempel in printed their revolutionary LZ77 algorithm, which the first to apply dictionary in compressing data. In specific, LZ77 used a dictionary that was dynamic by the name sliding window. In the year 1978, the two in printed LZ78 algorithm that make use of the dictionary. In contrary, LZ77, the algorithm passes the input data thereby generating a stationary dictionary instead of generating it forcefully (Huffman et.al, 1991) Detailed Description Several various techniques are employed to in the compression of data. Most of the techniques do not particularly stand individually, but ought to be joined together to make one compression algorithm. The ones that do stand solely are more efficient when combined with other techniques of compression. Most of the techniques are categorized as entropy coders, although others exist like burrows-wheeler transform and run-length encoding which are used commonly. Run-Length This kind of encoding is quite simple technique of compression that substitutes runs of more than one of similar character having a number representing the run length, then by original character. The single characters usually are coded like runs of one. RLE is important for a data that is highly redundant, images with indexes with same color pixels in one row or combined with compression techniques for instance, the burrow-wheel transform. The following illustration is a RLE example; Here is a quick example of RLE: Input: AAAABBBCCCCCDDEEEEEEEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Output: 4A3B5C2D7E39A Burrows-Wheeler Transform The BWT is a technique of compression that was invented in the year 1994 which intends to transform an input data in a manner that the identical character number of runs is optimized. BWT do not carry out any operation of compression. It simply changes the input in a way that it can efficiently be coded using an encoder of Run-Length or secondary techniques of compression. The BWT’s algorithm is simply: creating string array, generating all probable input string notations storing up each one of them in an array, sorting the array in an alphabetical order and lastly, returning the array’s column. BWT always function best on long inputs that have various alternating similar characters. For instance Input Rotation Alpha-Sorted Rotation Output BABABA& BABABA& ABABA&B BBB&AAA &BABABA ABA&BAB A&BABAB A&BABAB BA&BABA BAHBABA& ABA&BAB BABA&BA BABA&BA BA&BABA ABABA&B &BABABA Due its identical characters that alternate, performing this technique on the input produces a maximum result which another algorithm would compress like RLE that could yield 3B & 3A. The weakness that this technique has is that it does not produce optimal outcomes on majority of data in the real world (Burrows et al, 1994) Entropy Encoding In data compression entropy means the negligible bits number needed, on usual, to symbolize or represent a literal. An essential entropy coder joins a coder and a statistical representation. The input folder is parsed and employed to produce a statistical representation that comprises of the likelihood of a particular symbol appearing. The coder then uses the statistical representation to establish what byte codes-or-bit to allocate to every symbol in a manner that symbol that are most common have codes that are short and symbols that are least common with longest code. Shannon-Fano Coding It is among the earliest techniques of compression, invented in the year 1949 by Shannon Claude and Fano Robert. The technique involves producing a binary tree for representation of the probabilities of every symbol occurring. These symbols are structured that symbols that are most frequent appear at top tree and symbols that are least frequent appear in the bottom. A given symbol code is arrived at by looking for it in Fano-Shannon tree, and joined to the code that has a value between 1 or 0 for every branch taken in that order. For instance, if B has two branches towards the left and the other towards the right, the code will be 001 Shannon-Fano code do not produce maximum codes as the way it makes binary tree in bottom up format. Because of this reason the coding of Huffman is used in its place as it produces a maximum code for a given input. Criteria followed to produce Shannon-Fano is that simple: input parsing while taking the count of every symbol, establish the likelihood of every symbol by the use a symbol count, sorting the symbol by likelihood with the one that is most likely to occur first, generation of nodes for every symbol, prepending 1 and 0 towards the left and towards the right node codes in that order and continually apply fourth and fifth steps to the right and left sub trees till every node turns into a leaf. Huffman coding Huffman coding falls under the entropy codes that functions in a similar way to S-F coding although the binary tree is structured from top down to produce a maximum outcome. The algorithm to produce Huffman code shares its initial steps with S-F: input parsing while taking the count of every symbol, establish the likelihood of every symbol by the use a symbol count, sorting the symbol by likelihood with the one that is most likely to occur first, generation of nodes for every symbol, prepending 1 and 0 towards the left and towards the right node codes in that order and continually apply fourth and fifth steps to the right and left sub trees till every node turns into a leaf. The two lowest likelihood nodes should be removed from the line. The 1 and 0 should be prepended to the right and left node codes in that order. The new node should be created with the value the same as nodes’ probability sum. Then, the 1st node should be assigned towards the left branch and 2nd node towards the right branch. Lastly, the node should be added to the line. The node that remains last in the line is Huffman tree’s root. Arithmetic coding The method was made in the year 1979, which was researching on techniques of data compression to employ in their frames. The arithmetic coding is the most maximized entropy technique of coding if the aim is the top compression rate as it usually attains better outcomes than Huffman code. However, it is complex if one compares it to other methods. Instead of splitting the likelihood of symbols in a tree, this type of coding changes the input data to one rational number by transforming the base and allocating a binary point which is an encoded outcome. The value should be decoded to an original output through transforming the base to an original base from binary and replace the values by the use of symbols they represent. The universal algorithm to compute this code is: calculating the no of symbols that are unique in the input, assigning values to every unique symbol in order of appearance, by the use of step 2 values, replace symbols with codes, converting outcomes from step 3 to a binary point that is long fixed from the base and lastly, recording input string length some place in the outcome as it’s essential in decoding. An example of this is an operation that is operation encoded with an input of “BADCBDAB” 1. There are 4 symbols that are unique in the input, making the base = 4 and Length = 8 2. Allocate values to the symbols: A=0, B=1, C=2, D=3 3. Replace the input with codes: 0.103213014 4. Convert this 0.103213014 from base four to base two: 0.010100110010112 5. Outcome found. The input result’s length is 8. Taking that it’s an eight bit character, the length of the input is 64 bits while arithmetic coding’s length is 15 bits resulting to compression rate of 24 percent. The example shows how arithmetic coding does compress well if allocated character set that is limited (Lempel et al,1977) Reference Wolfram, Stephen (2002). A New Kind of Science. Champaign, IL: Wolfram Media. 10 Print. Ken Huffman, David A (1991). Huffman, Scientific American. pp. 54–58. Ziv J., Lempel A (1977). A Universal Algorithm for Sequential Data Compression, IEEE Transactions on Information Theory, Vol. 23, No. 3 pp. 337-343. Burrows M., Wheeler, D. J. (1994). A Block-Sorting Lossless Data Compression Algorithm. SRC Research Report 124, Digital Systems Research Center. Read More

Compression Algorithms - Research Paper Example

Extract of sample "Compression Algorithms"

CHECK THESE SAMPLES OF Compression Algorithms

Data Compression Algorithms.Use of Compression Algorithms in Forensics

PCM Theory and Audio Reduction Codecs and Techniques

Video Compression

Investigate data representations

Negative affects of piracy on the music industry

Lossy vs. Lossless Data Compression

Steganography: how it is used for counter/anti-forensics

The Essential Topics in Computer Science and Information Theory