Implementing Unix’s compress

Encoding

Essentially, if as we’re processing a text file, we’re able to identify recurring phrases, we can substitute those phrases with a code that would be shorter than the original text. Doing this over the entire document gives us the compression we’re after.

{
"A": "65",
"B": "66",
....
"a": "97",
"b": "98"
}
... app [cursor] appl ....
{
...
"app": 324,
....
}
Dictionary: { … “app”: 324, “appl”: 325 …}Output: … 324 …
34832 : 'Harry'
34833 : 'Potter'
34834 : 'Hogwarts'
34835 : 'magic'

Decoding

The decoding process is really just the opposite of the encoding process. The only change being that we need to re-create the dictionary with the 256 ASCII codes as a preliminary step before the rest of the decompression can begin.

Variable Length Encoding

Most implementations of this algorithm use the idea of a variable width code.

Code

The compression algorithm is fairly simple to implement.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store