Hacker Newsnew | past | comments | ask | show | jobs | submit | enum's commentslogin

This is a conjecture: modern chips are optimized to make the output code style of GCC/Clang go fast. So, the compilers optimize for the chip, and the chip optimizes for the popular compilers.


It’s in Rust…


It's an LLM, surely it could read gcc source code and translate it to Rust if it really tried hard enough


It wasn't given gcc source code, and was not given internet access. It the extent it could translate gcc source code, it'd need to be able to recall all of the gcc source from its weights.


Right. And the arguably simpler problem, where the model gets the C code directly, is active research: https://www.darpa.mil/research/programs/translating-all-c-to...

All of this work is extraordinarily impressive. It is hard to predict the impact of any single research project the week it is released. I doubt we'll ever throw away GCC/LLVM. But, I'd be surprised if the Claude C Compiler didn't have long-term impact on computing down the road.


I occasionally - when I have tokens to spare, a MAX subscription only lasts so far - have Claude working on my Ruby compiler. Far harder language to AOT compile (or even parse correctly). And even 6 months ago it was astounding how well it'd work, even without what I now know about good harnesses...

I think that is the biggest outcome of this: The notes on the orchestration and validation setup they used were far more interesting than the compiler itself. That orchestration setup is already somewhat quaint, but it's still far more advanced than what most AI users use.


Nice article. I believe the Claude C Compiler is an extraordinary research result.

The article is clear about its limitations. The code README opens by saying “don’t use this” which no research paper I know is honest enough to say.

As for hype, it’s less hyped than most university press releases. Of course since it’s Anthropic, it gets more attention than university press.

I think the people most excited are getting ahead of themselves. People who aren’t impressed should remember that there is no C compiler written in Rust for it to have memorized. But, this is going to open up a bunch of new and weird research directions like this blog post is beginning to do.


This isn't true right? You really can bring in zero dollars in grants and phone it in in the classroom. (Now, literally on Zoom!) I don't think it helps to pretend that everyone keeps pushing hard post tenure.

But, I think most people do. The system is deliberately designed to push an assistant professor so hard, that when they get a permanent contract, they're conditioned to keep pushing. It typically succeeds.


Yes; you can phone it in post-tenure. But just because it is possible doesn't mean (in my experience) it is common; and I don't think it's helpful (as TFA claims) to equate this possibility with "a total scam." To get tenure anywhere doesn't just require a huge amount of work as an Assistant Professor; it also requires a huge amount of work as a PhD student and potentially multiple rounds of post-doc'ing or other non-tenure-line work. In my experience, tenured professors have spent nearly two decades distorting their work-life balance beyond all recognition to the point that grinding insanely hard in pursuit of publications just feels normal.


> For tenure-track professors at top-twenty schools, step five is hard. Their tenured professors jealously guard their status, so rejection is the default. However, as school ranking goes down, runaway nepotism swiftly supplants professorial pride. At schools ranked worse than fifty, acceptance is the default.

Like everyone else, I have always had the pleasure of being at a top-20 school (in some list or the other!). Fortunately, I think this article is only attacking tenure at schools rated lower. (Let me know if I misinterpreted the article.)

We could eliminate tenure at lower-ranked schools. I'm not sure who will teach there if we do. The 90th percentile salary for a new tenure-track professor is 145K (https://cra.org/wp-content/uploads/2024/05/2023-CRA-Taulbee-... page 49). Nobody competent is going to take that salary without the possibility of tenure.


I just have an unprivileged secondary local account and do ssh dummy@localhost.

Is this wrong?


This was vibe coded in a few hours with Codex 5.2 Medium. We now have examples of agent-written web browsers and Unix-y OSes (https://github.com/viralcode/vib-OS). I thought it would be interesting to try something a little more out of distribution. All done in a single Codex session -- transcript is part of the release.


The post says this in other words: in Lean, Rocq, or any other theorem prover, you get a formally-verified proof, but you do NOT get a formally verified theorem statement.

So, even if the proof is correct, you need to determine if the theorem is what you want. Making that determination requires expertise. Since you cannot "run the theorem", you cannot vibe-code your way through it. E.g., there is no equivalent of "web app seems to be working!" You have to actually understand what the theorems are saying in a deep way.


Even then this seems much more promising to me compared to other areas. Writing theorem statements is much much easier than coming up with proofs so it's not a big deal if a human has to do that. And once that's done getting a correct proof out of an LLM/AI model can be done fully automatically (assuming you do get a proof out of it though!)


> Even then this seems much more promising to me compared to other areas. Writing theorem statements is much much easier than coming up with proofs so it's not a big deal if a human has to do that.

Not at all. Theorem crafting is hard. What's easy is standing up and proving a mathematical strawman which may or may not have any relation to the original problem.


I'm not sure this is true. Encoding theorems in dependent types takes a lot of expertise.

Even without the Lean technical details, a lot of math theorems just don't mean anything to most people. For example, I have no idea what the Navier-Stokes theorem is saying. So, I would not be able to tell you if a Lean encoding of the theorem is correct. (Unless of course, it is trivially broken, since as assuming False.)


There is no "Navier-Stokes theorem". There is a famous class of open problems whether Navier-Stokes equations are well-posed (have solutions that don't explode in finite time) for various initial data, but that type of question is completely irrelevant for any practical purposes. I do share the feeling though. As a non-expert, I have no idea what the existing, allegedly practically relevant, formalizations of distributed algorithms actually guarantee.


Thanks. As I said, I have no idea. :)


Yeah, if you have a formal proof of something but you don't know exactly what then you might as well vibe code the system itself. I've been wondering if it would be possible to formally define a property (like deadlock freedom or linearizability) and then vibe-code the spec (and associated implementation).

I don't know if that's possible but it seems like that's where the value would be.


It is but we're not there yet. You can use a refinement style system like Verus in which your spec is a very high level statement of system behavior, like "this system provides the properties that paxos is supposed to guarantee"(1), and then be guaranteed that the things that implement it are preserving that spec.

But this is still double black diamond expert territory - it remains a challenging thing to identify ways to state and refine those properties to enable the proof. But it's getting easier by the year and the LLMs are getting more useful as part of it.

https://github.com/verus-lang/verus

(1) You would have to state the consistency and conditional liveness properties directly, of course, as Verus doesn't understand "what paxos does". That would look like TLA+-style statements, "at some time in the future, a command submitted now will be executed, etc."


In the case of Lean, propositions are encoded as dependent types and the user typically has to encode that themselves then make use of e.g. tactics to derive a term of said type.

Writing a statement you don't understand then later getting a proof from an LLM doesn't seem all that useless to me; in my mind, it could still be useful as exploration. Worst case scenario: you encoded a tautology and the LLM gave you a trivial proof. Best case scenario: the proposition ends up being a lemma for something you want to prove.

I do think there is a kernel of truth in what you've stated: if the user does not actually understand the statement of a proposition, then the proof is not very useful to them, since they don't know what the statement's truthiness implies. As someone who used to do mathematics, I still find immense value in vibe-coding away mundane computations, similar to what Lean's `simp` tactic already does, but much more broad.


The worst case is that you vibe code a theorem that reads:

False => P

Then you vibe code a proof of this theorem. Then you get excited that you’ve proven P.

Some of the X discussion that prompted the OP was quite close to this. There are screenshots on X of Lean code that doesn’t compile, with Lean being blamed.


Right, the risk is encoding a vacuously true statement. I consider "false implies Q" and tautologies to be vacuously true statements, since both are truths that fail to convey any meaningful information.

In any case, the worst case scenario is having a vacuously true statement. I picked tautologies as an example since that and statements about the empty set are the first things that come into my mind.


I teach at a university, and spend plenty of time programming for research and for fun. Like many others, I spent some time on the holidays trying to push the current generation of Cursor, Claude Code, and Codex as far as I could. (They're all very good.)

I had an idea for something that I wanted, and in five scattered hours, I got it good enough to use. I'm thinking about it in a few different ways:

1. I estimate I could have done it without AI with 2 weeks full-time effort. (Full-time defined as >> 40 hours / week.)

2. I have too many other things to do that are purportedly more important that programming. I really can't dedicate to two weeks full-time to a "nice to have" project. So, without AI, I wouldn't have done it at all.

3. I could hire someone to do it for me. At the university, those are students. From experience with lots of advising, a top-tier undergraduate student could have achieved the same thing, had they worked full tilt for a semester (before LLMs). This of course assumes that I'm meeting them every week.


This is where the LLM coding shines in my opinion, there's a list of things they are doing very well:

- single scripts. Anything which can be reduced to a single script.

- starting greenfield projects from scratch

- code maintenance (package upgrades, old code...)

- tasks which have a very clear and single definition. This isn't linked to complexity, some tasks can be both very complex but with a single definition.

If your work falls into this list they will do some amazing work (and yours clearly fits that), if it doesn't though, prepare yourself because it will be painful.


I'm trying to determine what programming tasks are not in this list. :) I think it is trying to exclude adding new features and fixing bugs in existing code. I've done enough of that with LLMs, though not in large codebases.

I should say I'm hardly ever vibe-coding, unlike the original article. If I think I want code that will last, I'll steer the models in ways that lean on years of non-LLM experience. E.g., I'll reject results that might work if they violate my taste in code.

It also helps that I can read code very fast. I estimate I can read code 100x faster than most students. I'm not sure there is any way to teach that other than the old-fashioned way, which involves reading (and writing) a lot of code.


> I'm trying to determine what programming tasks are not in this list. :) I think it is trying to exclude adding new features and fixing bugs in existing code

Yes indeed, these are the things on the other hand which aren't working well in my opinion:

- large codebase

- complex domain knowledge

- creating any feature where you need product insights

- tasks requiring choices (again, complexity doesn't matter here, the task may be simple but require some choices)

- anything unclear where you don't know where you are going first

While you don't experience any of these when teaching or side projects, these are very common in any enterprise context.


How do you compare Claude Code to Cursor? I'm a Cursor user quietly watching the CC parade with curiosity. Personally, I haven't been able to give up the IDE experience.


Im so sold on the cli tools that I think IDEs are basically dead to me. I only have an IDE open so I can read the code, but most often I'm just changing configs (like switching a bool, or bumping up a limit or something like that).

Seriously, I have 3+ claude code windows open at a time. Most days I don't even look at the IDE. It's still there running in the background, but I don't need to touch it.


When I'm using Claude Code, I usually have a text editor open as well. The CC plugin works well enough to achieve most of what Cursor was doing for me in showing real-time diffs, but in my experience, the output is better and faster. YMMV


I use CC for so much more than just writing code that I cannot imagine being constrained within an IDE. Why would I want to launch an IDE to have CC update the *arr stack on my NAS to the latest versions for example? Last week I pointed CC at some media files that weren't playing correctly on my Apple TV. It detected what the problem formats were and updated my *arr download rules to prefer other releases and then configured tdarr to re-encode problem files in my existing library.


I was here a few weeks ago, but I'm now on the CC train. The challenge is that the terminal is quite counterintuitive. But if you put on the Linux terminal lens from a few years ago, and you start using it. It starts to make sense. The form factor of the terminal isn't intuitive for programming, but it's the ultimate.

FYI, I still use cursor for small edits and reviews.


I don't think I can scientifically compare the agents. As it is, you can use Opus / Codex in Cursor. The speed of Cursor composer-1 is phenomenal -- you can use it interactively for many tasks. There are also tasks that are not easier to describe in English, but you can tab through them.


Just FYI, these days cc has 'ide integration' too, it's not just a cli. Grab the vscode extension.


What did you build? I think people talk passed eachother when people don't share what exactly they were trying to do and achieving success/failure.


Referring to this: https://github.com/arjunguha/slopcoder

I then proceeded to use it to hack on its own codebase, and close a bunch of issues in a repository that I maintain (https://github.com/nuprl/MultiPL-E/commits/main/).


- https://publish.obsidian.md/aixplore/Practical+Applications/...

   Does it work if you change to torch.bfloat16?
- https://publish.obsidian.md/aixplore/Practical+Applications/...

  The PyTorch 2.9 wheels do work. You can pip install torch --index-url <whatever-it-is> and it just works. You do need to build flash attention from source, which takes an hour or so.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: