Any model that is able to generalize or extrapolate from a few examples to new e...

Any model that is able to generalize or extrapolate from a few examples to new examples has "understood" a relationship not explicit in the raw training data.

That "understanding" is what allows the model to capture the same information, plus implications not explicit in the data, in far fewer parameters than the raw data. It is compression by understanding.

The test of how much "understanding" a model has learned, is the quality of its responses to new examples and new combinations of patterns, beyond its training experiences.

Deep learning models are in a mathematical class called "universal approximators".

They don't just learn simple associations, correlations, or conditional probabilities. Deep learning models can learn arbitrarily complex functional relationships.

They provably can learn to do anything you can do. Anything any information processing device can do.

To be a "universal approximator", a neural network type model needs at least two layers of learning units, with enough units and parameters in the first layer, and enough training data and computation.

The problem is that simple two layer models may require an impractical quantity of units, parameters, data and computation.

Thus the reason the use of many (more than 2) "deep" layers, convolutions layers, and recurrent connections, all of which can radically reduce the model sizes, data and computation needed.

Successful language models understand most things at a level similar to how a blind person understands color - not directly, but by lots of indirect exposure to other people's experiences.

But multimodal models, give multiple data modalities, can understand color, images, spacial relationship, dynamical movement, etc., directly.

That is where things are quickly heading.