Having excellent semantic based compression is an enormous technological advancement. The fact that petabytes of textual information - effectively the entire history of the world in written text can be compressed to just a few hundred gigabytes (likely the size of GPT-4) or a bit more lossy to something at ~8 GB that can be put on a raspberry pi (Llama) is astounding. Regardless of whether people debate about 'reasoning' capabilities, or 'memorization/plagiarism' (or recalls facts), etc.
This is quite fascinating from a compression / database perspective because clearly semantic compression is far more efficient for semantic data (this is obvious, but has been hard to get started on until about 5 years ago). It still may not be in quite the right framework for this, but in time it may come.
This is quite fascinating from a compression / database perspective because clearly semantic compression is far more efficient for semantic data (this is obvious, but has been hard to get started on until about 5 years ago). It still may not be in quite the right framework for this, but in time it may come.