There was a series of articles on lossless and lossy and compression techniques ...

MauranKilom · on June 10, 2023

Another interesting facet is that, according to some schools of thought, compression and AI are equivalent problems: The better you understand something, the less you need to memorize to reproduce it.

https://en.wikipedia.org/wiki/Hutter_Prize (and https://en.wikipedia.org/wiki/AIXI)

Of course, large language models are (by definition) currently going the other direction, but it remains to be seen whether that leads to artificial intelligence (whatever that ends up meaning).

floobertoober · on June 10, 2023

> Of course, large language models are (by definition) currently going the other direction ...

How so? Aren't the networks' weights orders of magnitude smaller than the training data?

govg · on June 10, 2023

I interpreted that statement as saying the current practice is to make LLMs larger and larger (so they effectively memorize more and more data) to make them more powerful, but from the perspective of information theory, if models were powerful and "understanding", then models could stay the same size and become more and more powerful as they get increasingly better at compressing the available information. I am not sure if this interpretation was what was meant though.

karpierz · on June 10, 2023

I believe the parent poster's point is: LLMs are more effective when they use more memory, meaning the less they are forced to compress the training data, the better they perform.

MauranKilom · on June 14, 2023

But they don't losslessly recreate the training data.

brookst · on June 10, 2023

Maybe a naive question: are LLMs really going the other way? My intuition is that the model weights are much smaller than the information encoded in them.

ta93754829 · on June 11, 2023

Once ChatGPT was trained on "the internet of 2021", how big is the remaining model? Is ChatGPT effectively a compressed version of the internet?

ot · on June 10, 2023

Was it maybe Dr Dobb's? Mark Nelson had an excellent series of articles about compression which opened up that world for me. I ended up working on compression for my PhD many years later.

I can't find any archives for Dr Dobb's, but some articles are in his personal site, for example this one about arithmetic coding: https://marknelson.us/posts/1991/02/01/arithmetic-coding-sta...

twotwotwo · on June 11, 2023

It looks like the ones I were PC Magazine, https://books.google.com/books?id=gCfzPMoPJWgC&lpg=PP1&pg=PA... and https://books.google.com/books?id=eX8w8B-OhIIC&lpg=PA371&pg=... .

Much appreciation to all the folks who put in the work to make these things interesting and accessible.

oikawa_tooru · on June 12, 2023

Hi I am a neophyte interested in compression. It's difficult to find communities regarding compression online. I am looking for a guide. Is there any place that I can dm you?

Folcon · on June 10, 2023

I found their Encyclopedia page[0], but I suspect you meant something better than this?

Any further recollections to help drill down and find it?

- [0]: https://www.pcmag.com/encyclopedia/term/data-compression

twotwotwo · on June 11, 2023

It looks like it was Barry Simon in "Lab Notes" in PC Magazine of June and July 1993:

https://books.google.com/books?id=gCfzPMoPJWgC&lpg=PP1&pg=PA...

https://books.google.com/books?id=eX8w8B-OhIIC&lpg=PA371&pg=...