This doesn't make sense thermodynamically because models are far smaller than the training data they purport to hold and recall, so there must be some level of "understanding" going on. Whether that's the same as human understanding is a different matter.
It’s a lossy text compression technique. It’s clever applied statistics. Basically an advanced association rules algorithm which has been around for decades but modified to consider order and relative positions.
There is no understanding, regardless of the wants of all the capital investors in this domain.