You know LLM's have been used to solve very hard previously unsolved math proble...

patagurbon · 2026-01-16T10:59:23 1768561163

That Erdos problem solution is believed by quite a few to be a previous result found in the literature, just used in a slightly different way. It also seems not a lack of progress but simply no one cared to give it a go.

That’s a really fantastic capability, but not super surprising.

bakkoting · 2026-01-16T20:48:43 1768596523

You're thinking of a previous report from a month ago, #897 or #481, or the one from two weeks ago, #728. There's a new one from a week ago, #205, which is genuinely novel, although it is still a relatively "shallow" result.

Terence Tao maintains a list [1] of AI attempts (successful and otherwise). #205 is currently the only success in section 1, the "full solution for which subsequent literature review did not find new relevant prior partial or full solutions" section - but it is in that section.

As to speed, as far as I know the recent results are all due to GPT 5.2, which is barely a month old, or Aristotle, which is a system built on top of some frontier LLMs and which has only been accessible to the public for a month or two. I have seen multiple mathematicians report that GPT-5.2 is a major improvement in proof-writing, e.g. [2]

[1] https://github.com/teorth/erdosproblems/wiki/AI-contribution...

[2] https://x.com/AcerFur/status/1999314476320063546

utopiah · 2026-01-17T16:24:04 1768667044

Thanks for the wiki link, very interesting, in particular

- the long tail aspect of the problem space ; 'a "long tail" of under-explored problems at the other, many of which are "low hanging fruit" that are very suitable for being attacked by current AI tools'

- the expertise requirement, literature review but also 'Do I understand what the key ideas of the solution are, and how the hypotheses are utilized to reach the conclusion?' so basically one must already be an expert (or able to become one) to actually use this kind of tooling

and finally the outcomes which taking into consider the previous 2 points makes it very different from what most people would assume as "AI contributions".

utopiah · 2026-01-16T18:42:09 1768588929

I do, and I read Tao's comments on his usage too, that still doesn't address what I wrote.

computerex · 2026-01-16T23:00:40 1768604440

How does it not address what you wrote?

utopiah · 2026-01-17T14:34:22 1768660462

If I understood correctly you are giving an example of a "success" of using the technology. So that's addressing that the technology is useful or not, powerful or not, but it does not address what it actually does (maybe somebody in ChatGPT is a gnome that solved it, I'm just being provocative here to make the point) or more important that it does something it couldn't do a year ago or 5 years ago because how it is doing something new.

For example if somebody had used GPT2 with the input dataset of GPT5.2 (assuming that's the one used for Erdos problems) rather than the input dataset it had then, could it have solved those same problems? Without doing such tests it's hard to say if it moved fast, or at all. It's not because something new has been solved by it that it's new. Yes it's a reasonable assumption, but it's just that. So going for that to assuming "it" is "moving fast" is just a belief IMHO.

utopiah · 2026-01-17T14:37:06 1768660626

Also something that makes the whole process very hard to verify is what I tried to address in a much older comment : whenever LLMs are used (regardless of the input dataset) by someone who is an expert in the domain (rather than an novice) how can one evaluate what's been done by whom or what? Sure again there can be a positive result, e.g a solution to a problem until now unsolved, what does it say about the tool itself versus a user who is, by definition if they are an expert, up to date on the state of thew art?

utopiah · 2026-01-17T14:39:32 1768660772

Also the very fact that https://github.com/teorth/erdosproblems/wiki/AI-contribution... exist totally change the landscape. Because it's public it's safe to assume it's part of the input dataset so from now on, how does one evaluate the pace of progress, in particular for non open source models?