> The biggest time sink for me is validating answers so not sure I agree on that take.
But you're assuming that it'll always ne validated by humans. I'd imagine that most validation (and subsequent processing, especially going forward) will be done on machines.
By comparison with reality. The initial LLMs had "reality" be "a training set of text", when ChatGPT came out everyone rapidly expanded into RLFH (reinforcement learning from human feedback), and now there's vision and text models the training and feedback is grounded on a much broader aspect of reality than just text.
That's one way to do it, but overkill for this specific thing — self-driving cars or robotics, or natural use of smart-[phone|watch|glass|doorbell|fridge], likely sufficient.
Total surveillance may be necessary for other reasons, like making sure organised crime can't blackmail anyone because the state already knows it all, but it's overkill for AI.
Not if you source your training data from reality.
Are you treating "the internet" as "reality" with this line of questions?
The internet is the map, don't mistake the map for the territory — it's fine as a bootstrap but not the final result, just like it's OK for a human to research a topic by reading on Wikipedia but not to use it as the only source.
Sooner or later someone is going to figure out how to do active training on AI models. It's the holy grail of AI before AGI. This would allow you to do base training on a small set of very high quality data, and then let the model actively decide what it wants to train on going forward or let it "forget" what it wants to unlearn.
1. AI can do what we can do, in much the same way we can do it, because it's biologically inspired. Not a perfect copy, but close enough for the general case of this argument.
2. AI can't ever be perfect because of the same reasons we can't ever be perfect: it's impossible to become certain of anything in finite time and with finite examples.
3. AI can still reach higher performance in specific things than us — not everything, not yet — because the information processing speedup going from synapses to transistors is of the same order of magnitude as walking is to continental drift, so when there exists sufficient training data to overcome the inefficiency of the model, we can make models absorb approximately all of that information.
Does the AI need to know or the curator of the dataset? If the curator took a camera and walked outside (or let a drone wander around for a while), do you believe this problem would still arise?
But you're assuming that it'll always ne validated by humans. I'd imagine that most validation (and subsequent processing, especially going forward) will be done on machines.