Hacker Newsnew | past | comments | ask | show | jobs | submit | danmaz74's commentslogin

But the seed is the user input.

Maybe until the model outputs some affirming preamble, it’s still somewhat probable that it might disagree with the user’s request? So the agreement fluff is kind of like it making the decision to heed the request. Especially if we the consider tokens as the medium by which the model “thinks”. Not to anthropomorphize the damn things too much.

Also I wonder if it could be a side effect of all the supposed alignment efforts that go into training. If you train in a bunch of negative reinforcement samples where the model says something like “sorry I can’t do that” maybe it pushes the model to say things like “sure I’ll do that” in positive cases too?

Disclaimer that I am just yapping


The reason is that, for an array (or vector), you find the memory position for the i-th element with the base address + i*word_length. And the first element is in the base address - so has index 0.

It has memory offset 0, which we use as the array index for convenience so that there's no distinction between a memory offset-base and the corresponding array index-base. That's what happens when your arrays are barely different from pointers, as in C. If your arrays aren't just a stand-in for raw pointers, then there's little reason to require 0-based indexing. You can use more natural indexes based on your particular application, and many languages do allow arbitrary indices.

You can have redundancy with a monolithic architecture. Just have two different web server behind a proxy, and use postgres with a hot standby (or use a managed postgres instance which already has that).

I gave that a try, then I decided to use devcontainers instead, and I find that better, for the reasons you mentioned.


It looks like a lot of these issues are due to the fact that Rails has been around for a long time, has lots of versions, and you wanted to support many versions (Which is commendable, by the way). If you only had to support the latest Rails version, how much harder would it have been than doing the same for Phoenix?


In the latest Rails versions, it’s probably just as easy as in Phoenix. The question is whether, after years of churn in the Rails frontend ecosystem, the core team hasn’t already driven away most developers who might have cared. At this point, few people would use a library that targets the newest Rails versions when most teams treat Rails purely as a backend and handle the frontend with something else.


While Rails indeed tries to support old versions for a while I found that many devs are eager to stay on top of upgrades (which also has been less painful the last few years and definitely when done incrementally)

Fair there will be some old never updated backend only services, but that seems like a stretch that those will need a FE library all of a sudden


When the task is bigger than I trust the agent to work on it on its own, or for me to review the results, I ask it to create a plan with steps. Then create a md file for each step. I review the steps, and ask the agent to implement the first one. Review that one, fix it, then ask it to update the next steps, and then implement the next one. And so on, until finished.


Have you tried Scoped context packages? Basically for each task, I create a .md file that includes relevant file paths, the purpose of the task, key dependencies, a clear plan of action, and a test strategy. It’s like a mini local design doc. I found that it helps ground implementation and stabilizes the output of the agents.


I read this suggestion a lot. “Make clear steps, a clear plan of action.” Which I get. But then instead of having an LLM flail away at it could we give to an actual developer? It seems like we’ve finally realized that clear specs makes dev work much easier for LLMs. But the same is true for a human. The human will ask more clarifying questions and not hallucinate. The llm will role the dice and pick a path. Maybe we as devs would just rather talk with machines.


Yes, but the difference is that an LLM produces the result instantly, whereas a human might take hours or days.

So if you can get the spec right, and the LLM+agent harness is good enough, you can move much, much faster. It's not always true to the same degree, obviously.

Getting the spec right, and knowing what tasks to use it on -- that's the hard part that people are grappling with, in most contexts.


I'm using it to help me build what I want and learn how. It being incorrect and needing questioning isn't that bad, so long as you ARE questioning it. It has brought up so many concepts, parameters, etc that would be difficult to find and learn alone. Documentation can often be very difficult to parse. Llms make it easier.


> Maybe we as devs would just rather talk with machines.

This is kind of how I feel. Chat as an interaction is mentally taxing for me.


Separately, you have to consider that "wasting tokens spinning" might be acceptable if you're able to run hundreds of thousands of these things in parallel. If even a small subset of them translate to value, then you're far net ahead vs with a strictly manual/human process.


> hundreds of thousands of these things in parallel

At what cost,. monetary and environmental?


If the system provides value that is greater than its cost, then paying the cost to gain the value is always worthwhile - regardless of the magnitude of the cost.

As costs drop exponentially (a reasonable expectation for LLMs, etc.) then increasing agent parallelism becomes more and more economically viable over time.


>As costs drop exponentially

Not a reasonable expectation anymore. Moore's Law has been dead for more than a decade and we're getting close to physical limits.


I do the same thing with my engineers but I keep the tasks in Jira and I label them "stories".

But in all seriousness +1 can recommend this method.


This is built into Cursor now with plan mode https://cursor.com/docs/agent/planning


How does Cursor plan mode differ from Claude Code plan mode? I've used the latter a lot (it's been there a long time), and the description seems very similar. The big difference with the workflow I described is that with that plan mode you don't get to review and correct what happened between steps.


I've not used Claude Code, so my answer might not be that useful. But I would think that because both are chat-based interfaces you would be able to instruct the model to either continue without approval or wait for your approval at each step. I certainly do that with Cursor. Cursor has also recently started automatically generating TODO lists in the background (with a tool call I'm assuming), and displaying them as part of the thinking process without explicit instruction. I find that useful.


this plus a reset in between steps usually helps focus context in my experience


I had the same issues, and half-baked a similar solution. But then I looked into dev containers, with which I get higher isolation, including a DB for each instance (which is important for testing in parallel; I'm mostly using Ruby on Rails). What I'm doing now is:

* create a dev container for the project

* install the agent (Claude Code in my case) in the container as part of the dev container definition

* launch the container through DevPod (no affiliation) which automatically connects VS code or a JetBrains agent

So now I can run these in parallel, on a remote server if I want, and in "YOLO" mode. Personally, I'm finding this superior to the git worktree alternative.


Having worked on quite a few legacy applications in my career, I would say that, as for so many other issues in programming, the most important solution to this issue is good modularization of your code. That allows a new team to understand the application at high level in terms of modules interacting with each other, and when you need to make some changes, you only need to understand the modules involved, and ideally one at a time. So you don't need to form a detailed theory of the whole application all at the same time.

What I'm finding with LLMs is that, if you follow good modularization principles and practices, then LLMs actually make it easier to start working on a codebase you don't know very well yet, because they can help you a lot in navigating, understanding "as much as you need", and do specific changes. But that's not something that LLMs do on their own, at least from my own experience - you still need a human to enforce good, consistent modularization.


You can use codex CLI on a measly Plus plan


LLM coding agents can't learn from experience on our code, but we can learn from using them on our code, and in the context of our team and processes. I started creating some harnesses to help get more of what we want from these tools, and less of what we need to work too much on - eg, creating specialized agents to refactor code and test after it's been generated, and make it more in line with our standards, removing bogus tests, etc. The learning is embedded in the prompts for these agents.

I think that this approach can already get us pretty far. One thing I'm missing is tooling to make it easier to build automation on top of, eg, Claude Code, but I'm sure it's going to come (and I'm tempted to try vibe coding it; if only I had the time).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: