it's inherent to the training process of machine learning that you define the goal function. An inherent equation it tries to maximise statistically. For transformers its a bit more abstract, but the goal is still there iirc in the "correctness" of output