That statement will age like beautiful, fine wine when your LLMs keep training o...

visarga · on June 23, 2023

The notion of "garbage in, garbage out" is debunked when we introduce an additional signal, such as responses generated by humans, validation through tests (like code verification), or engagement in a game where maximizing the score is the aim.

Consider the extraordinary case of AlphaGo Zero. Despite beginning with a random initialization and without any human game data to train on, it mastered Go and chess solely through game feedback, reaching superhuman levels. The potency of feedback is nothing short of magical.

Shifting our focus to humans, what's the nature of our major breakthroughs? More often than not, we serendipitously encounter them. They don't typically arise from deduction but from meticulous observation of how our existing theories align with reality. Essentially, we observe and integrate feedback.

Involvement in a larger system - be it the world, society, the internet, or even a dialogue session with a human - is how AIs can transcend the mere regurgitation of the training set. With every interaction, they receive a nugget of new data in the prompt and feedback following the response.