Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Zip Files: History, Explanation and Implementation (hanshq.net)
147 points by Breadmaker on March 6, 2020 | hide | past | favorite | 18 comments


In college we had a project to write a compression/decompression utility.

We had to implement all the data structures from scratch and implement several compression schemes.

I got the simplest one, run length encoding, working but couldn't get the fancier LZ77 stuff to work correctly. Mostly because I had trouble implementing some of the data structures required in C++, from scratch.

The professor had a few of the better students present their programs and one of them demonstrated that not only could they make smaller files than WinZip, they did it faster too.

It absolutely blew my mind and that's probably the day I realized I needed to really step up my game and put in the work if I wanted to be successful in this field.


If you're not too worried about speed nor memory consumption, a simple brute-force LZ77 compressor is, to a first approximation, two nested loops (one to find a match, and one to extend it.) This will also compress better than the hash-table-based approach discussed in the article, since it always finds the best match, although at the expense of speed. In comparison, RLE is one nested loop (to count repeated bytes.)


Do you mean following the hash chain to the end of the window?

If you want an exhaustive LZ matchfinder, a trie, binary tree such as LZMA's bt4 or suffix array are more efficient algorithms, especially with window sizes larger than 32KB, and there are other better matchfinders past 64MB such as Rabin Karp for longer matches.

Zip is inefficient as a format on modern out of order CPUs. Modern replacements such as zstd have blocks of literals, lengths and decisions to decode an entropy coded table at once rather than conditionally branch.

Also, beating zip compression is quite easy: use a larger window size and a relatively quick (small number of tested slots) hash chain (or multiple e.g. 3, 4 and 7 bytes) can find better matches.

If decompression time and speed is not an issue, and you aren't limited to LZ based approaches, a bytewise model such as ppm or bitwise such as paq will provide much better compression without too much thinking.

Edit: forgot to mention, the parsing strategy is also important. Finding an approximately optimal lowest cost path through a block can often save 10%+ ratio with LZ + Huffman.


As a side note, one of the "tightest" ZIP compatible programs around AFAIK, Ken Silverman's KZIP:

http://advsys.net/ken/utils.htm

Worth of note is also ZIPMIX that in a few cases is very useful.


As a recovering alcoholic, I've always looked at the story of Phil Katz as a cautionary tale. He made enough money from people registering PKZip for discs and manuals that he was able to succumb to his tendency to socially isolate and drink to excess. He drank himself to death.


I remember the name Phil Katz from "PKZIP" back in the BBS days. I had no idea he had such an unhappy story!

The wikipedia page covers it briefly: https://en.wikipedia.org/wiki/Phil_Katz


I can recommend that you watch "BBS: The Documentary", they cover the arc wars as well, there's also this document: http://www.bbsdocumentary.com/library/CONTROVERSY/LAWSUITS/S...


I guess at least his initials will forever be immortalized in the first 2 bytes of all ZIP files (and as the article said, JARs, APKs, Office file formats, what else...)


Gary Kildall (creator of CP/M) went out in a similarly sad fashion unfortunately.

https://en.wikipedia.org/wiki/Gary_Kildall#Death

http://techrights.org/2020/01/05/history-canceled/


It happens to the best of us. Alcohol is no respecter of Kings or Queens. This is something judgemental people don't seem to realize.


Indeed, I know a couple of very talented, intelligent sensitive people (one being a family member) who've let their life and careers suffer because of alcoholism. I wish you all the best for your recovery.


Speaking from experience (856 days sober, had my last drink a year after moving to Milwaukee), Milwaukee is a hell of a town for an alcoholic.

One of my favorite coffee shops is right near the PKWARE building off Pittsburgh Ave.


I've been blind drunk in Milwaukee. I was attending the Naval training center just South. It was hell to wake up the next day with a raging hangover and have to march to chow in the cold wind coming off Lake Michigan in the winter. +2 years for me as well. I'm a much better programmer since I quit drinking.


I understand that Milwaukee has long been a center of brewing in the US but you're saying it also has a big binge drinking culture as well? Is that a historic relic of the big breweries were a substantial employer in the city?


I suspect it's the climate and location. pretty much any mid sized great lakes city is the same way. It's ridiculously easy to turn into an alcoholic, especially in the winter.


It really does. There are two things that fit together for me:

First, access - I can think of twelve bars within walking distance of my apartment, four of which are on the same block, and a total of eight of which are within two blocks of my apartment. Almost every grocery store has a liquor store as a part (not separate store, just separate section with a register), and some stores, like WalMart, sell booze in the normal aisles. It's everywhere.

Second, culture - in Wisconsin, if you're under 21 but a parent, guardian, or spouse then you are allowed to be served, possess, or consume alcohol, so teenagers drinking at home as part of a party is completely legal in the state. Bars are so close to the house, and every area has their own, so social drinking, especially when the winter is cold and dark (in parts of the state in parts of the winter, the sun is set by 4:30 in the afternoon), is very popular. Many of the locals, if they're religious, come from a history of Catholic, Lutheran, and similar religions that were typically not parts of the temperance movement, and many of the cultures that settled the land - Scandinavian and German most commonly - are also known for their relationship to alcohol.

Third, recent developments - the part of Milwaukee I'm in was decimated when the manufacturing base was systematically hollowed out in the late 1980s through the current day. Desperation and escape are two motivators to drink, and there are parts of the community that I'm in that have been hit hard by that.

Lewis Black has a bit about drinking in Wisconsin. It is exaggerated, but only slightly so, in a way that I didn't quite understand before I moved here: https://www.youtube.com/watch?v=7WlwumGkSec


Wow. Ok Yeah that all makes good sense. Thanks for the Lewis Black link - "how do you know when its New Years?" Genius! Cheers.


I feel very fortunate that, for whatever reason, I don't enjoy drinking during the day.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: