Hacker Newsnew | past | comments | ask | show | jobs | submit | nybsjytm's commentslogin

> brilliantly realised

Can you say more about this? Nothing about this approach seems very amazing to me. Construct an approximate solution by some numerical method (in this case neural networks), prove that a solution which is close enough to satisfying the equation can be perturbed to an exact solution. Does the second half use some nonstandard method?


This is one of those things that seems easy in retrospect, but wasn't particularly obvious at the time.

1. Proving existence to a differential equation using a numerical approximation is quite a bit more difficult than it seems at first. For Euler or NS, it seems almost absurd. Not only do you need a rather substantial amount of control over the linearisation, you need a way to rigorously control a posteriori error. This is easy for polynomials, but doing it for other models requires serious techniques that have only been created recently (rigorous quadrature via interval arithmetic).

2. Further to that, neural networks are far from an obvious choice. They are not exactly integrable and it is certainly not clear a priori that their biases would help to search for a blowup solution. In fact, I would have initially said it was a poor choice for the task. You also need to get it to a certain degree of precision and that's not easy.

3. The parameterisation of the self-similar solution turns out to be completely appropriate for a neural network. To be fair, solutions of this type have been considered before, so I'm willing to chalk this down to luck.

It's difficult to explain how challenging it is to fully derive a plan for a computationally-assisted proof of this magnitude unless you've tried it yourself on a new problem. At the end it seems completely obvious, but only after exhausting countless dead ends first.


> Folks knew the problem was near a solution once the monotonicity proof of the W functional came out.

This isn't true, it was a major accomplishment but by far the easiest part of Perelman's papers and not actually even part of the proof of the Poincaré conjecture.


Interesting, this was the big 'a-ha' moment in the grad course I took on the subject. Surgery was clearly important too, but this seemed to be more apparent from Hamilton's work. The W functional was what provided the control needed. Also, calling it 'easy' feels like it dismisses the insights necessary to have developed the functional in the first place.

Happy to be corrected on this; I wasn't an active mathematician at the time, so everything I know comes from other accounts.


I called it the easiest part of his papers, not easy. Either way, it's actually not relevant to the proof. For example, I believe that Morgan and Tian's 500 page exposition of the proof doesn't mention it even once.

Moreover I'd strongly dispute that there was any particular point where it was clear that the problem was "near a solution." The W functional is an example where experts could very quickly verify that Perelman had made at least one major new discovery about the Ricci flow. But the proof of the Poincaré conjecture was on another order of complexity and required different (more complicated) new results about Ricci flow.


> I know they are so close to a computationally-assisted proof of counterexample that it is virtually inevitable at this point.

That's a strong claim. Is it based on more than the linked work on some model problems from fluid mechanics?

I will say that I dread the discourse if it works out, since I don't believe enough people will understand that using a PINN to get new solutions of differential equations has substantially no similarity to asking ChatGPT (or AlphaProof etc) for a proof of a conjecture. And there'll be a lot of people trying to hide the difference.


It's based on knowledge of the related estimates, applying similar techniques to geometric problems, knowledge of all the prior works that lead to the current work, and speaking with members of the team themselves. They are much further along than it appears at first glance. All of the major bottlenecks have fallen; the only concern was whether double precision accuracy is good enough. The team seems to have estimates that are strong enough for this, but obviously keep them close to their chest.

PINNs are different in concept, yes, but clearly no less important, so the additional attention will be appreciated. Asking LLMs for proofs is a different vein of research, often involving Lean. It is much further behind, but still making ground.


> PINNs are different in concept, yes, but clearly no less important

If anything I think they're more important! Whether or not it works out for Navier-Stokes, this kind of thing is an extremely plausible avenue of approach and could yield interesting singularities for other major equations. I am however extremely concerned about public understanding. I know you are well aware that this is worlds away from the speculative technologies like 'mathematical superintelligence' but, if it works out, it'll be like a nuclear bomb of misinformation about AI and math.


Any mathematicians who have actually called it "new interesting mathematics", or just an OpenAI employee?

The paper in question is an arxiv preprint whose first author seems to be an undergraduate. The theorem in it which GPT improves upon is perfectly nice, there are thousands of mathematicians who could have proved it had they been inclined to. AI has already solved much harder math problems than this.


The OpenAI employee posting this is a well known theoretical computer scientist: https://en.wikipedia.org/wiki/S%C3%A9bastien_Bubeck


Yes, he published a paper claiming GPT-4 has "sparks" of AGI. What else is he known for in the field of computer science?

https://arxiv.org/abs/2303.12712


Hello, TCS assistant professor here: he is legitimately respected among his peers.

Of course, because I am a selfish person, I'd say I appreciate most his work on convex body chasing (see "Competitively chasing convex bodies" on the Wikipedia link), because it follows up on some of my work.

Objectively, you should check his conference submission record, it will be a huge number of A*/A CORE rank conferences, which means the best possible in TCS. Or the prizes section on Wikipedia.


I don't deny that his output is highly valued among AI researchers.

Provocative as my question may be, the point I wanted to make is that his most highly cited paper that I already mentioned is suspiciously very in line with the OpenAI narrative. I doubt if any of his GPT research is really independent. With great salary comes great responsibility.


Not sure if you're trying to be provocative, but you could just click his name in the link you provided to find a lengthy list of arXiv preprints: https://arxiv.org/search/cs?searchtype=author&query=Bubeck,+...


Not sure what this has to do with my post.


> Any mathematicians who have actually called it "new interesting mathematics", or just an OpenAI employee?

He is a mathematician. Unless you wanted to say "any other mathematicians..."


> Checking the correctness of proofs is a much easier problem than coming up with the proof in the first place.

Just so this isn't misunderstood, not so much cutting-edge math is presently possible to code in lean. The famous exceptions (such as the results by Clausen-Scholze and Gowers-Green-Manners-Tao) have special characteristics which make them much more ground-level and easier to code in lean.

What's true is that it's very easy to check if a lean-coded proof is correct. But it's hard and time-consuming to formulate most math as lean code. It's something many AI research groups are working on.


> The famous exceptions (such as the results by Clausen-Scholze and Gowers-Green-Manners-Tao) have special characteristics which make them much more ground-level and easier to code in lean.

"Special characteristics" is really overstating it. It's just a matter of getting all the prereqs formalized in Lean first. That's a bit of a grind to be sure, but the Mathlib effort for Lean has the bulk of the undergrad curriculum and some grad subjects formalized.

I don't think AI will be all that helpful wrt. this kind of effort, but it might help in some limited ways.


Yes I definitely concur, I have spent significant time with it.

The main bottleneck is having the libraries that define the theorems and objects you need to operate at those levels. Everything is founded on axiomatic foundations and you need to build all of maths on top of that. Projects like mathlib are getting us there but it is a massive undertaking.

It’s not just that it is a lot of maths to go through, it’s also that most maths has not really been proven to this degree of exactitude and there is much gap-filling to do when trying to translate existing proofs, or the reasoning style might be quite distant to how things are expressed in Lean. Some maths fields are also self-consistent islands that haven’t been yet connected to the common axiomatic foundations, and linking them is a serious research endeavor.

Although Lean does allow you to declare theorems as axioms. It is not common practice, but you can skip high up the abstraction ladder and set up a foundation up there if you are confident those theorems are correct. But still defining those mathematical objects can be quite hard on its own, even if you skip the proving.

Anyways, the complexity of the Lean language itself doesn’t help either. The mode of thinking you need to have to operate it is much closer to programming than maths, and for those that think that the Rust borrow-checker is a pain, this is an order of magnitude more complex.

Lean was a significant improvement in ergonomics compared to the previous generation (Coq, Isabelle, Agda…), but still I think there is a lot of work to be done to make it mathematician-friendly.

Most reinforcement-learning AI for maths right now is focused on olympiad problems, hard but quite low in the maths abstraction ladder. Often they don’t even create a proof, they just solve problems that end with an exact result and you just check that. Perhaps the reasoning was incorrect, but if you do it for enough problems you can be confident that it is not just guessing.

On the other side of the spectrum you have mathematicians like Tao just using ChatGPT for brainstorming. It might not be great at complex reasoning, but it has a much wider memory than you do and it can remind you of mathematical tools and techniques that could be useful.


> Anyways, the complexity of the Lean language itself doesn’t help either. The mode of thinking you need to have to operate it is much closer to programming than maths, and for those that think that the Rust borrow-checker is a pain, this is an order of magnitude more complex.

Could you elaborate on this? I'm interested to learn what the complexities are (beyond the mathematical concepts themselves).


Found something I wrote last year, see below, but off the top of my head:

Something like 5 different DSLs in the same language, most of it in a purist functional style that is neither familiar to most mathematicians nor most programmers, with type-checking an order of magnitude more strict and complex than any programming language (that's the point of it), with rather obscure errors most of the time.

It's really tedious to translate any non-trivial proofs to this model, so usually you end up proving it again almost from scratch within Lean, and then it's really hard to understand as it is written. Much of the information to understand a proof is hidden away as runtime data that is usually displayed via a complex VSCode extension. It's quite difficult to understand from the proof code itself, and usually it doesn't look anything like a traditional mathematical proof (even if they kind of try by naming keywords with a similar terminology as in normal proofs and sprinkling some unicode symbols).

I never-ever feel like I'm doing maths when I'm using Lean. I'm fighting with the syntax to figure out how to express mathematical concepts in the style that it likes, always having several different ways of achieving similar things (anti Zen of Python). And I'm fighting with the type-checker to transform this abstract expression into this other abstract expression (that's really what a proof is when it boils down to it), completely forgetting about the mathematical meaning, just moving puzzle pieces around with obscure tools.

And even after all of this, it is so much more ergonomic than the previous generation of proof-assistants :)

---

I think that the main reasons for Lean's complexity are:

- That it has a very purist functional style and syntax, literally reflecting the Curry-Howard Correspondence (function = proof), rather than trying to bridge the gap to more familiar maths notation.

- That it aims to be a proof assistant, it is fundamentally semi-automatic and interactive, this makes it a hard design challenge.

- A lot of the complexity is aimed at giving mathematicians the tools to express real research maths in it, on this it has been more successful than any alternative.

- Because of this it has at least 5 different languages embedded in it: functional expressions of theorems, forward proofs with expression transformations, backward proofs with tactics, the tactics metaprogramming macro language, and the language to define data-types (and at least 4 kinds of data-types with different syntax).


> Many AI researchers are mathematicians. Any theoretical AI research paper will typically be filled with eye-wateringly dense math. AI dissolves into math the closer you inspect it. It's math all the way down.

There is a major caveat here. Most 'serious math' in AI papers is wrong and/or irrelevant!

It's even the case for famous papers. Each lemma in Kingma and Ba's ADAM optimization paper is wrong, the geometry in McInnes and Healy's UMAP paper is mostly gibberish, etc...

I think it's pretty clear that AI researchers (albeit surely with some exceptions) just don't know how to construct or evaluate a mathematical argument. Moreover the AI community (at large, again surely with individual exceptions) seems to just have pretty much no interest in promoting high intellectual standards.


I'd be interested to read about the gibberish in UMAP, I know the paper "An improvement of the convergence proof of the ADAM-Optimizer" for the lemma problem in the original ADAM but hadn't heard of the second one. Do you have any further info on it?


> Each lemma in Kingma and Ba's ADAM optimization paper is wrong

Wrong in the strict formal sense or do you mean even wrong in “spirit”?

Physicists are well-known for using “physicist math” that isn’t formally correct but can easily be made as such in a rigorous sense with the help of a mathematician. Are you saying the papers of the AI community aren’t even correct “in spirit”?


Much physicist math can't be made rigorous so easily! Which isn't to say that much of it doesn't still have great value.

However the math in AI papers is indeed different. For example, Kingma and Ba's paper self-presents as having a theorem with a rigorous proof via a couple of lemmas proved by a chain of inequalities. The key thing is that the mathematical details are purportedly all present, but are just wrong.

This isn't at all like what you see in physics papers, which might just openly lack detail, or might use mathematical objects whose existence or definition remain conjectural. There can be some legitimate problems with that, but at least in the best cases it can be very visionary. (Mirror symmetry is a standard example.) By contrast I'm not sure what 'spirit' is even possible in a detailed couple-page 'proof' that its authors probably don't even fully understand. In most cases, the 'theorem' isn't remotely interesting enough as pure mathematics and is also not of any serious relevance to the empirical problem at hand. It just adds an impressive-looking section to the paper.

I do think it's possible that in the future there will be very interesting pure mathematics inspired by AI. But it hasn't been found yet, and I'm very certain it won't come from reconsidering these kinds of badly-written theorems and proofs.


Amazing! I looked into your ADAM claim, and it checks out. Thanks! Now I'm curious. I you have the time, could you please follow up with the 'etc...'?


There's a related section about 'mathiness' in section 3.3 of the article "Troubling Trends in Machine Learning Scholarship" https://arxiv.org/abs/1807.03341. I would say the situation has only gotten worse since that paper was written (2018).

However the discussion there is more about math which is unnecessary to a paper, not so much about the problem of math which is unintelligible or, if intelligible, then incorrect. I don't have other papers off the top of my head, although by now it's my default expectation when I see a math-centric AI paper. If you have any such papers in mind, I could tell you my thoughts on it.


If this were the case, I don't see why we'd need to wait for an AI company to make a breakthrough in math research. The key issue instead is how to encode 'real-life' statements in a formal language - which to me seems like a ludicrous problem, just complete magical thinking.

For example, how might an arbitrary statement like "Scholars believe that professional competence of a teacher is a prerequisite for improving the quality of the educational process in preschools" be put in a lean-like language? What about "The theoretical basis of the October Revolution lay in a development of Marxism, but this development occurred through three successive rounds of theoretical debate"?

Or have I totally misunderstood what people mean when they say that developments in automatic theorem proving will solve LLM's hallucination problem?


You can't talk about formally verifiable truthiness until you solve epistemology. This can be achieved formally in mathematics, with known principal limitations. Here strict theorem-proving, Lean-style, is viable.

It can also be achieved informally and in a fragments way in barely-mathematical disciplines, like biology, linguistics, and even history. We have chains of logical conclusions that do not follow strictly, but with various probabilistic limitations, and under modal logic of sorts. Several contradictory chains follow under the different (modal) assumptions / hypotheses, and often both should be considered. This is where probabilistic models like LLMs could work together with formal logic tools and huge databases of facts and observations, being the proverbial astute reader.

In some more relaxed semi-disciplines, like sociology, psychology, or philosophy, we have a hodgepodge of contradictory, poorly defined notions and hand-wavy reasoning (I don't speak about Wittgenstein here, but more about Freud, Foucault, Derrida, etc.) Here, I think, the current crop of LLMs is applicable most directly, with few augmentations. Still a much, much wider window of context might be required to make it actually productive, by the standards of the field.


Hmmm I think even in something very nominally nearby like theoretical physics, there's very little that's similar to theorem proving. I don't see how AlphaProof could be a stepping stone to anything like what you're describing.

Generally, I think many people who haven't studied mathematics don't realize how huge the gulf is between "being logical/reasonable" and applying mathematical logic as in a complicated proof. Neither is really of any help for the other. I think this is actually the orthodox position among mathematicians; it's mostly people who might have taken an undergraduate math class or two who might think of one as a gateway to the other. (However there are certainly some basic commonalities between the two. For example, the converse error is important to understand in both.)


I don't think that mathematicians are actually in the best position to judge how math/logic might help in practical applications, because they are usually not interested in practical applications at all (at least for the last 70 years or so). Especially, they are also not interested in logic at all.

But logic is very relevant to "being logical/reasonable", and seeing how mathematicians apply logic in their proofs is very relevant, and a starting point for more complex applications. I see mathematics as the simplest kind of application of logic you can have if you use only your brain for thinking, and not also a computer.

"Being logical/reasonable" also contains a big chunk of intuition/experience, and that is where machine learning will make a big difference.


Probabilistic reasoning is possible in a formal setting; It produces a probability distribution over answers. To ground probabilistic logic itself I'm not aware of much progress beyond the initial idea of logical induction[0].

[0] https://arxiv.org/abs/1609.03543


This takes for granted a formal setting, which is what I'm questioning in any of these 'real world' contexts.


> For example, how might an arbitrary statement like "Scholars believe that professional competence of a teacher is a prerequisite for improving the quality of the educational process in preschools" be put in a lean-like language? What about "The theoretical basis of the October Revolution lay in a development of Marxism, but this development occurred through three successive rounds of theoretical debate"?

> This takes for granted a formal setting, which is what I'm questioning in any of these 'real world' contexts.

A formal model of semantics would likely be a low-level physical representation of possible states augmented with sound definitions of higher-level concepts and objects. I don't think humans are capable of developing a formal semantics that would work for your sentences (it's taken us hundreds of years to approach formalization of particle physics), but I think that an automated prover with access to physical experiments and an LLM could probably start building a more comprehensive semantics.


People talk about how LLMs need "more data" but data is really sort of a blunt instrument to try and impart the LLM a lot of experience. Unfortunately data isn't actually experience, it's a record of an experience.

Now what we've seen with e.g. Chess and Go, is that when you can give a tensor model "real experience" at the speed of however many GPUs you have, the model is quickly capable of superhuman performance. Automatic theorem proving means you can give the model "real experience" without armies of humans writing down proofs, you can generate and validate proofs as fast as a computer will let you and the LLM has "real experience" with every new proof it generates along with an objective cost function, was the proof correct?

Now, this might not let us give the LLMs real experience with being a teacher or dealing with Marxist revolutionaries, but it would be a sea change in the ability of LLMs to do math, and it probably would let us give LLMs real experience in writing software.


This is a thing I'm working on, so I have some potentially useful thoughts. tl;dr, it doesn't have to be about encoding arbitrary real life statements to be super duper useful today.

> how might an arbitrary statement like "Scholars believe that professional competence of a teacher is a prerequisite for improving the quality of the educational process in preschools" be put in a lean-like language?

Totally out of scope in the any near future for me at least. But that doesn't prevent it from being super useful for a narrower scope. For example:

- How might we take a statement like "(x + 1) (x - 5) = 0" and encode it formally?

- Or "(X X^T)^-1 X Y = B"?

- Or "6 Fe_2 + 3 H_2 O -> ?"?

We can't really do this for a huge swath of pretty narrow applied problems. In the first, what kind of thing is X? Is it an element of an algebraically closed field? In the second, are those matrices of real numbers? In the third, is that 6 times F times e_2 or 6 2-element iron molecules?

We can't formally interpret those as written, but you and I can easily tell what's meant. Meanwhile, current ways of formally writing those things up is a massive pain in the ass. Anything with a decent amount of common sense can tell you what is probably meant formally. We know that we can't have an algorithm that's right 100% of the time for a lot of relevant things, but 99.99% is pretty useful. If clippy says 'these look like matrices, right?' and is almost always right, then it's almost always saving you lots of time and letting lots more people benefit from formal methods with a much lower barrier to entry.

From there, it's easy to imagine coverage and accuracy of formalizable statements going up and up and up until so much is automated that it looks kinda like 'chatting about real-life statements' again. I doubt that's the path, but from a 'make existing useful formal methods super accessible' lens it doesn't have to be.


The quality of AI algorithms is not based on formal mathematics at all. (For example, I'm unaware of even one theorem relevant to going from GPT-1 to GPT-4.) Possibly in the future it'll be otherwise though.


There are some AI guys like Christian Szegedy who predict that AI will be a "superhuman mathematician," solving problems like the Riemann hypothesis, by the end of 2026. I don't take it very seriously, but that kind of prognostication is definitely out there.


link to this prediction? The famous old prediction of Szegedy was IMO gold by 2026 and that one is basically confirmed right? I think 2027/2028 personally is a breakeven bet for superhuman mathematician.


I consider it unconfirmed until it happens! No idea where I saw it but it was probably on twitter.


I’m sure AI can “solve” the Riemann hypothesis already, since a human proved it and the proof is probably in its training data.


No, nobody has proved it.

Side point, there is no existing AI which can prove - for example - the Poincaré conjecture, even though that has already been proved. The details of the proof are far too dense for any present chatbot like ChatGPT to handle, and nothing like AlphaProof is able either since the scope of the proof is well out of the reach of Lean or any other formal theorem proving environment.


what does this even mean? Surely an existing AI could reguritate all of Perelman's arxiv papers if we trained them to do that. Are you trying to make a case that the AI doesn't understand the proof it's giving? Because then I think there's no clear goal-line.


You don't even need AI to regurgitate Perelman's papers, you can do that in three lines of python.

What I meant is that there's no AI you can ask to explain the details of Perelman's proof. For example, if there's a lemma or a delicate point in a proof that you don't understand, you can't ask an AI to clarify it.


This has to come with an asterisk, which is that participants had approximately 90 minutes to work on each problem while AlphaProof computed for three days for each of the ones it solved. Looking at this problem specifically, I think that many participants could have solved P6 without the time limit.

(I think you should be very skeptical of anyone who hypes AlphaProof without mentioning this - which is not to suggest that there's nothing there to hype)


Think more is made of this asterix than necessary. Quite possible adding 10x more GPUs would have allowed it to solve it in the time limit.


Very plausible, but that would also be noteworthy. As I've mentioned in some other comments here, (as far as I know) we outside of DeepMind don't know anything about the computing power required to run alphaproof, and the tradeoff between computing power required and the complexity of problems it can address is really key to understanding how useful it might be.


Certainly an interesting information that AlphaProof needed three days. But does it matter for evaluating the importance of this result? No.


I agree that the result is important regardless. But the tradeoff of computing time/cost with problem complexity is hugely important to think about. Finding a proof in a formal language is trivially solvable in theory since you just have to search through possible proofs until you find one ending with the desired statement. The whole practical question is how much time it takes.

Three days per problem is, by many standards, a 'reasonable' amount of time. However there are still unanswered questions, notably that 'three days' is not really meaningful in and of itself. How parallelized was the computation; what was the hardware capacity? And how optimized is AlphaProof for IMO-type problems (problems which, among other things, all have short solutions using elementary tools)? These are standard kinds of critical questions to ask.


Though, if you start solving problems that humans can't or haven't solved, then questions of capacity won't matter much. A speedup in the movement of the maths frontier would be worth many power stations.


For some time a 'superhuman math AI' could be useful for company advertising and getting the attention of VCs. But eventually it would be pretty clear that innovative math research, with vanishingly few exceptions, isn't very useful for making revenue. (I am a mathematician and this is meant with nothing but respect for math research.)


The big exception being predicting market movements, and I can't imagine how much money the hedge funds are spending on this right now.


"Making revenue" is far from being the only metric by which we deem something worthy.


As a mathematician, of course I agree. But in a sentence like:

> A speedup in the movement of the maths frontier would be worth many power stations

who is it 'worth' it to? And to what end? I can say with some confidence that many (likely most, albeit certainly not all) mathematicians do not want data centers and power stations to guzzle energy and do their math for them. It's largely a vision imposed from without by Silicon Valley and Google research teams. What do they want it for and why is it (at least for now) "worth" it to them?

Personally, I don't believe for a second that they want it for the good of the mathematical community. Of course, a few of their individual researchers might have their own personal and altruistic motivations; however I don't think this is so relevant.


There is something called applied math, and there is a big gulf between applied math and pure math. This new technology has the potential of making use of much more "pure math" for applied math, unifying much of pure math, applied math, and programming. I don't really care about the RH, I care about that.


> Finding a proof in a formal language is trivially solvable in theory since you just have to search through possible proofs until you find one ending with the desired statement. The whole practical question is how much time it takes.

No, in my experience the whole practical question is, can it be found automatically, or can it not be found automatically? Because there is an exponential search space that conventional automated methods will not be able to chew through, it either works, or it doesn't. AlphaProof shows that for some difficult IMO problems, it works.


More or less. Modern theorem provers, even fully automatic ones, can prove incredibly difficult problems if given enough time. With 3 days and terabytes of memory, perhaps they could? Would be interesting to compare Alphaproof with a standard theorem prover that is given similarly astronomical computing resources.


That is an interesting thought. But I doubt that standard automated theorem proving techniques would have solved this in a 100 years, even with immense resources, as all they do is (basically) brute force search, and it remains an exponential search space. Therefore 3 days does not really matter, indeed, 3 months would still be a result that 10 years ago I would have thought impossible.


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: