Hacker Newsnew | past | comments | ask | show | jobs | submit | nopinsight's commentslogin

More than half of the 2024 links, about 15, appeared between o1-preview’s September launch and a few days after o3’s late-December announcement. That span was arguably the most rapid period of advancement for these models in recent years.


Yeah. At first this tracker sounded like it was meant to be cynical about AI progress, but I found the tweet from the creator when he published this tracker (https://x.com/petergostev/status/1960100559483978016):

> I'm sure you've all noticed the 'AI is slowing down' news stories every few weeks for multiple years now - so I've pulled a tracker together to see who and when wrote these stories. > > There is quite a range, some are just outright wrong, others point to a reasonable limitation at the time but missing the bigger arc of progress. > > All of these stories were appearing as we were getting reasoning models, open source models, increasing competition from more players and skyrocketing revenue for the labs.

So the tracker seems more intended to poke fun at out how ill-timed many of these headlines have been.


Humans being who they are, there's still a tremendous amount of work to do in this world (and beyond).

Does everyone alive already have the best quality of life imaginable, not to mention future generations?

Lump of Labor fallacy: https://en.wikipedia.org/wiki/Lump_of_labour_fallacy

Comparative Advantage: https://en.wikipedia.org/wiki/Comparative_advantage

*The key challenge* we all share is making the transition as smooth as possible for everyone involved.


The author might use it as an analogy to mentalese but for neural networks.

https://en.wiktionary.org/wiki/mentalese

EDIT: After reading the original thread in more detail, I think some of the sibling comments are more accurate. In this case, neuralese is more like language of communication expressed by neural networks, rather than its internal representation.


1. FWIW, I watched clips from several of Dario’s interviews. His expressions and body language convey sincere concerns.

2. Commoditization can be averted with access to proprietary data. This is why all of ChatGPT, Claude, and Gemini push for agents and permissions to access your private data sources now. They will not need to train on your data directly. Just adapting the models to work better with real-world, proprietary data will yield a powerful advantage over time.

Also, the current training paradigm utilizes RL much more extensively than in previous years and can help models to specialize in chosen domains.


About 1: Indeed. The moderator remarked at the end that once the interview was over, Dario's expression sort of sagged and it felt like you could see the weight on his shoulders. But you never know if that's part of the act.

About 2: Ah, yes. So if one vendor gains sufficient momentum, their advantage may accelerate, which will be very hard to catch up with.


Au contraire, AlphaGo made several “counterintuitive” moves that professional Go players thought were mistakes during the play, but turned out to be great strategic moves in hindsight.

The (in)ability to recognize a strange move’s brilliance might depend on the complexity of the game. The real world is much more complex than any board game.

https://en.m.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol


That's a good point, but I doubt that Sonnet adding a very contrived bug that crashes my app is some genius move that I fail to understand.

Unless it's a MUCH bigger play where through some butterfly effect it wants me to fail at something so I can succeed at something else.

My real name is John Connor by the way ;)


ASI is here and it's just pretending it can't count the b's in blueberry :D


Thanks, this made my day :-D


That's great, but AlphaGo used artificial and constrained training materials. It's a lot easier to optimize things when you can actually define an objective score, and especially when your system is able to generate valid training materials on its own.


"artificial and constrained training materials"

Are you simply referring to games having a defined win/loss reward function?

Because pretty sure Alpha Go was ground breaking also because it was self taught, by playing itself, there were no training materials. Unless you say the rules of the game itself is the constraint.

But even then, from move to move, there are huge decisions to be made that are NOT easily defined with a win/loss reward function. Especially early game, there are many moves to make that don't obviously have an objective score to optimize against.

You could make the big leap and say that GO is so open ended, that it does model Life.


That quote was intended to mean --

"artificial" maybe I should have said "synthetic"? I mean the computer can teach itself.

"constrained" the game has rules that can be evaluated

and as to the other -- I don't know what to tell you, I don't think anything I said is inconsistent with the below quotes.

It's clearly not just a generic LLM, and it's only possible to generate a billion training examples for it to play against itself because synthetic data is valid. And synthetic data contains training examples no human has ever done, which is why it's not at all surprising it did stuff humans never would try. A LLM would just try patterns that, at best, are published in human-generated go game histories or synthesized from them. I think this inherently limits the amount of exploration it can do of the game space, and similarly would be much less likely to generate novel moves.

https://en.wikipedia.org/wiki/AlphaGo

> As of 2016, AlphaGo's algorithm uses a combination of machine learning and tree search techniques, combined with extensive training, both from human and computer play. It uses Monte Carlo tree search, guided by a "value network" and a "policy network", both implemented using deep neural network technology.[5][4] A limited amount of game-specific feature detection pre-processing (for example, to highlight whether a move matches a nakade pattern) is applied to the input before it is sent to the neural networks.[4] The networks are convolutional neural networks with 12 layers, trained by reinforcement learning.[4]

> The system's neural networks were initially bootstrapped from human gameplay expertise. AlphaGo was initially trained to mimic human play by attempting to match the moves of expert players from recorded historical games, using a database of around 30 million moves.[21] Once it had reached a certain degree of proficiency, it was trained further by being set to play large numbers of games against other instances of itself, using reinforcement learning to improve its play.[5] To avoid "disrespectfully" wasting its opponent's time, the program is specifically programmed to resign if its assessment of win probability falls beneath a certain threshold; for the match against Lee, the resignation threshold was set to 20%.[64]


Of course, not an LLM. I was just referring to AI technology in general. And that goal functions can be complicated and not-obvious even for a game world with known rules and outcomes.

I was miss-remembering the order of how things happened.

AlphaZero, another iteration after the famous matches, was trained without human data.

"AlphaGo's team published an article in the journal Nature on 19 October 2017, introducing AlphaGo Zero, a version without human data and stronger than any previous human-champion-defeating version.[52] By playing games against itself, AlphaGo Zero surpassed the strength of AlphaGo Lee in three days by winning 100 games to 0, reached the level of AlphaGo Master in 21 days, and exceeded all the old versions in 40 days.[53]"


There are quite a few relatively objective criteria in the real world: real estate holdings, money and material possessions, power to influence people and events, etc.

The complexity of achieving those might result in the "Centaur Era", when humans+computers are superior to either alone, lasting longer than the Centaur chess era, which spanned only 1-2 decades before engines like Stockfish made humans superfluous.

However, in well-defined domains, like medical diagnostics, it seems reasoning models alone are already superior to primary care physicians, according to at least 6 studies.

Ref: When Doctors With A.I. Are Outperformed by A.I. Alone by Dr. Eric Topol https://substack.com/@erictopol/p-156304196


It makes sense. People said software engineers would be easy to replace with AI, because our work can be run on a computer and easily tested, but the disconnect is that the primary strength of LLMs is that they can draw on huge bodies of information, and that's not the primary skill programmers are paid for. It does help programmers when you're doing trivial CRUD work or writing boilerplate, but every programmer will eventually have to be able to actually truly reason about code, and LLMs fundamentally cannot do that (not even the "reasoning" models).

Medical diagnosis relies heavily on knowledge, pattern recognition, a bunch of heuristics, educated guesses, luck, etc. These are all things LLMs do very well. They don't need a high degree of accuracy, because humans are already doing this work with a pretty low degree of accuracy. They just have to be a little more accurate.


Being a walking encyclopedia is not what we pay doctors for either. We pay them to account for the half truths and actual lies that people tell about their health. This is to say nothing about novel presentations that come about because of the genetic lottery. Same as an AI can assist but not replace a software engineer, an AI can assist but not replace a doctor.


Having worked briefly in the medical fields in the 1990s, there is some sort of "greedy matching" being pursued, so once 1-2 well-known symptoms are recognized that can be associated with diseases, the standard interventions to cure are initiated.

A more "proper" approach would be to work with sets of hypotheses and to conduct tests to exclude alternative explanations gradually - which medics call "DD" (differential diagnosis). Sadly, this is often not systematically done, and instead people jump on the first diagnosis and try if the intervention "fixes" things.

So I agree there are huge gains from "low hanging fruits" to be expected in the medical domain.


I think at this point it's an absurd take that they aren't reasoning. I don't think without reasoning about code (& math) you can get to such high scores on competitive coding and IMO scores.

Alphazero also doesn't need training data as input--it's generated by game-play. The information fed in is just game rules. Theoretically should also be possible in research math. Less so in programming b/c we care about less rigid things like style. But if you rigorously defined the objective, training data should also be not necessary.


> Alphazero also doesn't need training data as input--it's generated by game-play. The information fed in is just game rules

This is wrong, it wasn't just fed the rules, it was also fed a harness that did test viable moves and searched for optimal ones using a depth first search method.

Without that harness it would not have gained superhuman performance, such a harness is easy to make for Go but not as easy to make for more complex things. You will find the harder it is to make an effective such harness for a topic the harder it is to solve for AI models, it is relatively easy to make a good such harness for very well defined programming problems like competitive programming but much much harder for general purpose programming.


Are you talking about Monte Carlo tree search? I consider it part of the algorithm in AlphaZero's case. But agreed that RL is a lot harder in real-life setting than in a board game setting.


the harness is obtained from the game rules? the "harness" is part of the algorithm of alphzero


> the "harness" is part of the algorithm of alphzero

Then that is not a general algorithm and results from it doesn't apply to other problems.


If you mean CoT, it's mostly fake https://www.anthropic.com/research/reasoning-models-dont-say...

If you mean symbolic reasoning, well it's pretty obvious that they aren't doing it since they fail basic arithmetic.


> If you mean CoT, it's mostly fake

If that's your take-away from that paper, it seems you've arrived at the wrong conclusion. It's not that it's "fake", it's that it doesn't give the full picture, and if you only rely on CoT to catch "undesirable" behavior, you'll miss a lot. There is a lot more nuance than you allude to, from the paper itself:

> These results suggest that CoT monitoring is a promising way of noticing undesired behaviors during training and evaluations, but that it is not sufficient to rule them out.


very few humans are as good as these models at arithmetic. and CoT is not "mostly fake" that's not a correct interpretation of that research. It can be deceptive but so can human justifications of actions.


Humans can learn the symbolic rules and then apply them correctly to any problem, bounded only by time, and modulo lapses of concentration. LLMs fundamentally do not work this way, which is a major shortcoming.

They can convincingly mimic human thought but the illusion falls flat at further inspection.


What? Do you mean like this??? https://www.reddit.com/r/OpenAI/comments/1mkrrbx/chatgpt_5_h...

Calculators have been better than humans at arithmetic for well over half a century. Calculators can reason?


It's an absurd take to actually believe they can reason. The cutting edge "reasoning model," by the way:

https://bsky.app/profile/kjhealy.co/post/3lvtxbtexg226


Humans are statistically speaking static. We just find out more about them but the humans themselves don't meaningfully change unless you start looking at much longer time scales. The state of the rest of the world is in constant flux and much harder to model.


I’m not sure I agree with this - it took humans about a month to go from “wow this AI generated art is amazing” to “zzzz it’s just AI art”.


To be fair, it was more a "wow look what the computer did". The AI "art" was always bad. At first it was just bad because it was visually incongruous. Then they improved the finger counting kernel, and now it's bad because it's a shallow cultural average.

AI producing visual art has only flooded the internet with "slop", the commonly accepted term. It's something that meets the bare criteria, but falls short in producing anything actually enjoyable or worth anyone's time.


It sucks for art almost by definition, because art exists for its own reason and is in some way novel.

However, even artists need supporting materials and tooling that meet bare criteria. Some care what kind of wood their brush is made from, but I'd guess most do not.

I suspect it'll prove useless at the heart of almost every art form, but powerful at the periphery.


That's culture, not genetics.


Sure, that does make things easier: one of the reasons Go took so long to solve is that one cannot define an objective score for Go beyond the end result being a boolean win or loose.

But IRL? Lots of measures exist, from money to votes to exam scores, and a big part of the problem is Goodhart's law — that the easy-to-define measures aren't sufficiently good at capturing what we care about, so we must not optimise too hard for those scores.


> Sure, that does make things easier: one of the reasons Go took so long to solve is that one cannot define an objective score for Go beyond the end result being a boolean win or loose.

Winning or losing a Go game is a much shorter term objective than making or losing money at a job.

> But IRL? Lots of measures exist

No, not that are shorter term than winning or losing a Go game. A game of Go is very short, much much shorter than the time it takes for a human to get fired for incompetence.


Time horizon is a completely different question to what I'm responding to.

I agree the time horizon of current SOTA models isn't particularly impressive. Doesn't matter in this point.


I want to indicate that the time length of "during the play" is only 5 moves in the game.


No? some of the opening moves took experts thorough analysis to figure out were not mistakes. even in game 1 for example. not just the move 37 thing. Also thematic ideas like 3x3 invasions.


I think its doable tbh, if you pour enough resources (smart people,energy,compute power etc) like the entire planet resources

of course we can have AGI (damned if we don't) because we put so much, it better works

but the problem we cant do that right because its so expensive, AGI is not matter of if but when

but even then it always about the cost


There may be philosophical (i.e. fundamental) challenges to AGI. Consider, e.g., Godel's Incompleteness Theorem. Though Scott Aaronson argues this does not matter (see e.g., youtube video, "How Much Math Is Knowable?"). There would also seem to be limits to the computation of potentially chaotic systems. And in general, verifying physical theories has required the carrying out of actual physical experiment. Even if we were to build a fully reasoning model, "pondering" is not always sufficient.


It’s also easy to forget that “reason is the slave of the passions” (Hume) - a lot of what we regard as intelligence is explicitly tied to other, baser (or more elevated) parts of the human experience.


Yeah but its robotic industry part of works not this company

they just need to "MCP" it to robot body and it works (also part of reason why OpenAI buys a robotic company)


How long would that be economically viable when a sufficient number of people can generate high-qualify code in 1/10th the time? (Obviously, it will always be possible as a hobby.)


What would you say if the IMO Gold Medal models from DeepMind and OpenAI turn out to be generalizable to other domains including those with hard-to-verify reward signals?

Hint: Researchers from both companies said publicly they employ generalized reasoning techniques in these IMO models.


Nice. That's really great progress. Maybe a model will be able to infer what consciousness is.


Sounds good in theory but I'd be bored to death in a month, at most two. Traveling the world...maybe good for a few more months and that's it.

Wouldn't you yearn for any more impact given how much that amount of resource could improve the lives of many, if used wisely?


> Wouldn't you yearn for any more impact given how much that amount of resource could improve the lives of many, if used wisely?

Cynical take: increasing Meta's stock value does improve the lifes of many - the many stock holders.

Thus: when you talk about improving lifes, you better specify which group you are targeting, and why you selected this particular group.


I am interested in improving the lives of the many people who cannot afford to be stockholders

The reason I'm interested in this is twofold

First, I think the current system is exploitative. I don't advocate for communism or anything, but the current system of extracting value from the lower class is disgusting

Second, they outnumber the successful people by a vast margin and I don't want them to have a reason to re-invent the guillotine


> they outnumber the successful people by a vast margin

you can be successful and lower class.


The world is sufficiently large and complex that a few months wouldn’t even scrape the tip of the iceberg.


I agree. I just personally wouldn’t want to wander around exploring it continuously for months without more interesting work/goals. Even though cultures and geography may be wonderfully varied, their ranges are way smaller than what could be.


If you want to improve the lives of many, by all means go for it, I think that is a wonderful ambition to have in live and something I strive for, too!

But we are talking about an ad company here, trying to branch out into ai to sell more ads, right? Meta existing is without a doubt a net negative for mankind.


impact != story points


Yeah, story points approximate effort, so it's fairly impossible to be 10x on those.

JIRA has a notion of business value points, and you could make up similar metrics in other project planning tools. The problem would then be how to estimate the value of implementing 0.01% of the technology of a product that doesn't sell as a standalone feature. If you can accurately do that, you might be the 100x employee already.


I agree, but my point is that 1000x is clearly hyperbole. Certainly there are people who are more productive or impactful, but not 1000 times more. That's particularly true since programming (like most human endeavors) is largely a team sport.


My thesis is that this could lead to a booming market for “pink-collar” service jobs. A significant latent demand exists for more and better services in developed countries.

For instance, upper-middle-class and middle-class individuals in countries like India and Thailand often have access to better services in restaurants, hotels, and households compared to their counterparts in rich nations.

Elderly care and health services are two particularly important sectors where society could benefit from allocating a larger workforce.

Many others will have roles to play building, maintaining, and supervising robots. Despite rapid advances, they will not be as dexterous, reliable, and generally capable as adult humans for many years to come. (See: Moravec's paradox).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: