Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>Seems like it could easily be training data set size as well.

I'm convinced that's the case. On any major LLM I can carpet bomb Java/Python boilerplate without issue. For Rust, at least last time I checked, it comes up with non-existing traits, more frequent hallucinations and general struggle to use the context effectively. In agent mode it turns into a first fight with the compiler, often ending in credit destroying loops.

And don't get me started when using it for Nix...

So not surprised about something with orders of magnitude smaller public corpus.



I realized this too, and it led me to the conclusion that LLMs really can't program. I did some experiments to find what a programming language would look like, instead of e.g. python, if it were designed to be written and edited by an LLM. It turns out that it's extremely verbose, especially in variable names, function names, class names, etc. Actually, it turned out that classes were very redundant. But the real insight was that LLMs are great at naming things, and performing small operations on the little things they named. They're really not good at any logic that they can't copy paste from something they found on the web.


Is this really a surprise? I'd hazard a guess that the ability to program and beyond that - to create new programming languages - requires more than just probabilistic text prediction. LLMs work for programming languages where they have enough existing corpus to basically ape a programmer having seen similar enough text. A real programmer can take the concepts of one programming language and express them in another, without having to have digested gigabytes of raw text.

There may be emergent abilities that arise in these models purely due to how much information they contain, but I'm unconvinced that their architecture allows them to crystallize actual understanding. E.g. I'm sceptical that there'd be an area in the LLM weights that encodes the logic behind arithmetic and gives rise to the model actually modelling arithmetic as opposed to just probabilistically saying that the text `1+1=` tended to be followed by the letter `2`.


> I did some experiments to find what a programming language would look like, instead of e.g. python, if it were designed to be written and edited by an LLM.

Did your experiment consist of asking an LLM to design a programming language for itself?


Yes. ChatGPT 4 and Claude 3.7. They led me to similar conclusions, but they produced very different syntax, which led me to believe that they were not just regurgitating from a common source.


Great so your experiment just consisted of having an LLM hallucinate

That's not really an experiment is it? You basically just used them to create a hypothesis but you never actually proved anything

They're great at writing text and code so the fact that the other LLM was able to use that syntax to presumably write code that worked (which you had no way of proving since you can't actually run that code) doesn't really mean anything

It would be similar to having it respond in a certain JSON format, they are great at that too. Doesn't really translate to a real world codebase


  > That's not really an experiment is it? You basically just used them to create a hypothesis but you never actually proved anything
The experiment was checking how well another unrelated LLM could write code using the syntax. And then in the reverse direction in new sessions.

  > They're great at writing text and code so the fact that the other LLM was able to use that syntax to presumably write code that worked (which you had no way of proving since you can't actually run that code) doesn't really mean anything
Of course I could check the code. I had no compiler for it, but "running" code in one's head without a compiler is something first year students get very good at in their Introduction To C course. And checking how they edit and modify the code.

This isn't a published study, it was an experiment. And it influenced how I use LLMs for work, for the better. I'd even call that a successful experiment, now that I better understand the strengths and limitations of LLMs in this field.


> And it influenced how I use LLMs for work, for the better

How so?


I let the LLM come up with all the boiler plate classes, functions, modules, etc that it wants. I let it name things. I let it design the API. But what I don't let it do, is design the flow of operations. I come up with a flow chart as a flow of operations, and explain that to the LLM. Almost any if statement is a result of something I specifically mentioned.


Is there a reason you believe the models can accurately predict this sort of thing?


There wasn't, but after taking the syntax that I developed with one model to another model, and having it write some code in that syntax, it did very well. Same in the other direction.

LLMs need all their context within easy reach. An LLM-first (for editing) language still has code comments and docstrings. Identifier names are long, and functions don't really need optional parameters. Strict typing is a must.


In my experience, claude works well at writing rust, and gemini is terrible. gemini writes rust as if it's a C++ programmer who has spent one day learning the basics of rust.


i tried gemini, openai, copilot, claude on reasonably big rust project. claude worked well to fix use, clippy, renames, refactorings, ci. i used highest cost claude with custom context per crate. never was able to get it write new code well.

for nix, i is nice template engine to start or search. did not tried big nix changes.


Yep. I had similar issues asking Gemini for help with F#, I assume lack of training data is the cause.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: