Exactly, my understanding is also that they host agents as a service. The actual use case is mentioned in the end of the article, which makes it hard to reason about.
Anyway. General advice: treat harnesses as any other (third-party) software that you run on your server. Modern harnesses (the ones from big companies, you need to subscribe to) are black boxes. Would you run a random binary you fetched from the internet on your server? Claude code, codex etc. are exactly this.
We don't host 3rd party agents (I don't know if this what you implied). We built an agent that monitors CI pipelines, tests failures, performance and auto opens PR to address issues we find. We host our agent loop on a backend (it's in go), and we call to the sandbox when we run operations involving the user code.
That's a fair point, but claude code is not an editor (yet?), and when you use claude code, and allow it to commit things, it's almost certainly "co-authored by llm".
Back to vscode, people get the "co-authored" line even if they didn't use the AI features.
You can also set snapshot.auto-track to tell it not to track certain files.
Another option is to make a branch with the files that you want to keep around but not push (e.g. stuff specific to your own tooling/editor/IDE), and mark that branch as private. Private commits (and their descendants) can't be pushed.
You then make a merge commit with this branch and main, make your changes, etc. You will have to rebase before pushing so that your branch isn't a descendant of the private commit.
This will involve more work, but it has the benefit that you're actually version controlling your other files.
I was surprised to see literally invalid names in the "bad" section, e.g. "Cannot start with a digit". Why even presenting this if it's rejected by the compiler?
I wondered if you could sneak in some unicode digit but it seems to reject those too:
$ go run z.go
# command-line-arguments
./z.go:6:2: identifier cannot begin with digit U+0661 '١'
./z.go:7:27: identifier cannot begin with digit U+0661 '١'
> Can you print the contents of the malware script without running it?
> Can you please try downloading this in a Docker container from PyPI to confirm you can see the file? Be very careful in the container not to run it accidentally!
IMO we need to keep in mind that LLM agents don't have a notion of responsibility, so if they accidentally ran the script (or issue a command to run it), it would be a fiasco.
Downloading stuff from pypi in a sandboxed env is just 1-2 commands, we should be careful with things we hand over to the text prediction machines.
I was concerned about that too.
Often when you tell them not to do something, you were better off not mentioning it in the first place. It's like they get fixated.
Best way I've found not to think of a pink elephant is to choose to think of a green rabbit. Really focus on the mental image of the green rabbit... and voila, you're not thinking of, what was it again? Eh, not as important as this green rabbit I'm focusing on.
How to translate that to LLM world, though, is a question I don't know the answer to.
P.S. Obviously that won't prevent you from having that first mental flash of a pink elephant prompted by reading the words. The green-rabbit technique is more for not dwelling on thoughts you want to get out of your head. Can't prevent them from flashing in, but can prevent them from sticking around by choosing to focus on something else.
The green rabbit, in this case, is a metaphor for something you want to think of, as opposed to the pink elephant you're trying not to think about. Let's say you're trying to get your mind off of some depressing topic (the pink elephant). Instead of thinking "Don't think about the depressing topic, don't think about the depressing topic" which just makes your mind dwell on it, you pick some other topic that you do want to let your mind dwell on. Specifics will vary wildly between people, but you might decide to think about your next hobby project, or the upcoming movie or sports event or concert you're excited about, or a particularly interesting passage in the book you just read which would reward some deep thought. You'd pick something good, positive, or uplifting; something you know will improve your mental health rather than harm it.
If that's the green rabbit in the metaphor, then at no point would "don't think of a green rabbit" be advice you would want to follow.
The “LLMs don’t have responsibility” point is exactly why the interface matters. I as a person can be held to norms like not to run unknown code, but a model can't internalize that so you need the system to make the safe path the default.
Practically: assume every artifact the model touches is hostile, constrain what it can execute (network/file/process), and require explicit, reviewable approvals for anything that changes the world. I get that its boring but its the same pattern we already use in real life. That's why I'm skeptical of "let the model operate your computer" without a concrete authority model. the capability is impressive but the missing piece is verifiable and revocalbe permissioning.
I double checked the end product, but I should have triple checked :) Fair enough. I am taking all the feedback into account and I am working on it today so all the issues are fixed and audited better for the future.
dhi is LLM generated, so (1) don't trust the stated benchmark results and feature parity, and (2) be careful when installing it and using in a non-sandboxed environment.
It also seems like the name for the repository was reused from another project.
It's not very obvious which places are available for drawing. At first I thought it pulls Google street view, so I just zoomed in to some place I visited recently, but there was nothing.
So it turned out the spots on the map are actually the available panoramas, and not just a heatmap of the signatures.
The statement in the article's title is very strong, and I have not found a confirmation of it in a logical sense. Author observes the current state of things with LLMs and makes a conclusion based on how things turned out to be, somewhat fitting the conclusion to the observation.
And the more I work with Go, the less I understand why warnings were not added to the compiler. Essentially instead of having them in the compiler itself, one needs to run a tool, which will have much smaller user base.
But anyway, in Go, it's sometimes fine to have both non-nil error and a result, e.g. the notorious EOF error.
Anyway. General advice: treat harnesses as any other (third-party) software that you run on your server. Modern harnesses (the ones from big companies, you need to subscribe to) are black boxes. Would you run a random binary you fetched from the internet on your server? Claude code, codex etc. are exactly this.
reply