It's difficult to fully describe, so let's just give up and use a deeply flawed ...

ben_w · 2024-12-14T12:07:08 1734178028

> It's difficult to fully describe, so let's just give up and use a deeply flawed benchmark? Why not try to develop benchmarks that actually work and tell us something useful instead?

Two reasons. First: in this sub-thread I'm focusing on employment issues due to AI, so consider the quote from above:

> At the point that we have an AI that's capable of every task that say a 110 IQ human is, including manipulating objects in the physical world, then basically everyone is unemployed unless they're cheaper than the AI.

IQ doesn't capture what machines do, but it does seem to capture a rough approximation of what humans do, so when the question is "can this thing cause humans to become economically irrelevant?", that's still a close approximation of the target to beat.

You just have to remember that as AI don't match human thinking, so an AI which is wildly superhuman at arithmetic or chess isn't necessarily able to tie its own shoelaces. The AI has to beat humans at everything (at least, everything economically relevant) at that IQ level for this result.

Second: Lots of people are in fact trying to develop new benchmarks.

This is a major research topic all by itself (as in "I could do a PhD in this"), and also a fast-moving topic (as in "…but if I tried to do a PhD, I'd be out of date before I've finished"). I'm not going to go down that rabbit hole in the space of a comment about the exact performance thresholds an AI has to reach to be economically disruptive.

For a concrete example of quite how fast-moving the topic is, here's a graph of how fast AI is now beating new benchmarks: https://ourworldindata.org/grapher/test-scores-ai-capabiliti...

disgruntledphd2 · 2024-12-14T08:56:12 1734166572

More importantly, basically all of the IQ tests are in the training sets, so it's hard to know how the models would perform on similar tests not in the training set.

ben_w · 2024-12-14T11:55:41 1734177341

Indeed. People do try to overcome this, for example see the difference in results between "Show Offline Test" and "Show Mensa Norway" on https://trackingai.org/IQ

Even the lower value is still only an upper-bound on human-equivalent IQ, as we can't really be sure the extent to which the training data is (despite efforts) enough to "train for the test", nor can we really be sure that these tests are merely a proxy for what we think we mean by intelligence rather than what we actually mean by intelligence (a problem which is why IQ tests have been changed over the years).

My intention in this sub-thread is more of the economic issues rather than the technical, which complicates things further, because if you have an AI architecture where spending a few tens of millions on compute — either from scratch or as fine-tuning — gets you superhuman performance (performance specifically, regardless of if you count it as "intelligent" or not), then any sector which employs merely a couple of hundred workers (in rich economies) will still have an economic incentive to train an AI on those workers to replace them within a year.

This is still humans having jobs, and still being economically relevant, but makes basically everyone into a contractor that has to be ready and able to change jobs suddenly, which is also economically disruptive because we're not set up for that.