And the internal state of a JPEG decoder can be an order of magnitude larger than the JPEG file (especially progressive JPEG that can't stream its output).
You can make any lossy compression scheme into a lossless scheme by appending the diff between the original and the compressed. In many cases, this still results in a size savings over the original.
You can think of this as a more detailed form of "I before E, except after C, except for species and science and..." Or, if you prefer, as continued terms of a Taylor-series expansion. The more terms you add, the more closely you approximate the original.
It's funny that sometimes people consider LLMs as compression engines. While a lot of information gets lost in each direction (through the neural net)