Hacker News new | past | comments | ask | show | jobs | submit login

In the article the compression doesn’t make sense

If you are only sending one word, and the recipient already needs to know the word, then you only need 1 bit, essentially just signaling that you are saying that specific word

If you want a richer vocabulary, you could create an index of about 300k words (from the English dictionary), shared between the parties

Then to send any word you only need to send one number, and in binary it would have between 1 and at most 19 bits, for any word in the index (2^19 is around 500k)

That’s without even sorting the index by frequency of appearance/usage

27 bits for just one word seems wasteful




> 27 bits for just one word seems wasteful

Where did you get that you need 27 bits for one word?

> Then to send any word you only need to send one number, and in binary it would have between 1 and at most 19 bits

Yep! By sorting by frequency, you are able to make it so the majority of words have shorter bit strings. By my calculations, common words such as "the", "of", and "and" will have ~4-6 bits associated with them. That means you can encode a large number of words (googling says those words make up ~1/7 of words based on frequency) with only 4-6 bits each. That's far from the 27 bits you calculated


About the 27 bits. This is from the article:

> Fano’s balancing approach starts by assigning the O and one other letter to the left branch, with the five total uses of those letters balancing out the five appearances of the remaining letters.

> The resulting message requires 27 bits.

I didn’t calculate it, the author of the article did




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: