When I've wanted it to not do things like this, I've had good luck directing it to... not look at those sources.
For example when I've wanted to understand an unfolding story better than the news, I've told it to ignore the media and go only to original sources (e.g. speech transcripts, material written by the people involved, etc.)
Deep Search is pretty good for current news stories. I've had it analyze some legal developments in a European nation recently and it gave me a great overview.
So... I know that people frame these sorts of things as if it's some kind of quantization conspiracy, but as someone who started using Claude Code the _moment_ that it came out, it felt particularly strong. Then, it feels like they... tweaked something, whether in CC or Sonnet 3.7 and it went a little downhill. It's still very impressive, but something was lost.
I've found Gemini 2.5 Pro to be extremely impressive and much more able to run in an extended fashion by itself, although I've found very high variability in how well 'agent mode' works between different editors. Cursor has been very very weak in this regard for me, with Windsurf working a little better. Claude Code is excellent, but at the moment does feel let down by the model.
I've been using Aider with Gemini 2.5 Pro and found that it's very much able to 'just go' by itself. I shipped a mode for Aider that lets it do so (sibling comment here) and I've had it do some huge things that run for an hour or more, but assuredly it does get stuck and act stupidly on other tasks as well.
My point, more than anything, is that... I'd try different editors and different (stronger) models and see - and that small tweaks to prompt and tooling are making a big difference to these tools' effectiveness right now. Also, different models seem to excel at different problems, so switching models is often a good choice.
Eh I am happy waiting many years before any of that. If it only work right with the right model for the right job, and it’s very fuzzy which models work for which tasks, and the models change all the time (often times silently)… at some point it’s just easier to do the easy task I’m trying to offload then juggle all off this.
If and when I go about trying these tools in the future, I’ll probably looks for and open source TUI, so keep up the great work on aider!
Edit: In case anyone wants to try it, I uploaded it to PyPI as `navigator-mode`, until (and if!) the PR is accepted. By I, I mean that it uploaded itself. You can see the session where it did that here: https://asciinema.org/a/9JtT7DKIRrtpylhUts0lr3EfY
and, because Aider's already an amazing platform without the autonomy, it's very easy to use the rest of Aider's options, like using `/ask` first, using `/code` or `/architect` for specific tasks [1], but if you start in `/navigator` mode (which I built, here), you can just... ask for a particular task to be done and... wait and it'll often 'just get done'.
It's... decidedly expensive to run an LLM this way right now (Gemini 2.5 Pro is your best bet), but if it's $N today, I don't doubt that it'll be $0.N by next year.
I don't mean to speak in meaningless hype, but I think that a lot of folks who are speaking to LLMs' 'inability' to do things are also spending relatively cautiously on them, when tomorrow's capabilities are often here, just pricey.
I'm definitely still intervening as it goes (as in the Devin demos, say), but I'm also having LLMs relatively autonomously build out large swathes of functionality, the kind that I would put off or avoid without them. I wouldn't call it a programmer-replacement any time soon (it feels far from that), but I'm solo finishing architectures now that I know how to build, but where delegating them to a team of senior devs would've resulted in chaos.
[1]: also for anyone who hasn't tried it and doesn't like TUI, do note that Aider has a web mode and a 'watch mode', where you can use your normal editor and if you leave a comment like '# make this darker ai!', Aider will step in and apply the change. This is even fancier with navigator/autonomy.
It does for me, yes -- models seem to be pretty capable of adhering to the tool call format, which is really all that they 'need' in order to do a good job.
I'm still tweaking the prompts (and I've introduced a new, tool-call based edit format as a primary replacement to Aider's usual SEARCH/REPLACE, which is both easier and harder for LLMs to use - but it allows them to better express e.g. 'change the name of this function').
So... if you have any trouble with it, I would adjust the prompts (in `navigator_prompts.py` and `navigator_legacy_prompts.py` for non-tool-based editing). In particular when I adopted more 'terseness and proactively stop' prompting, weaker LLMs started stopping prematurely more often. It's helpful for powerful thinking models (like Sonnet and Gemini 2.5 Pro), but for smaller models I might need to provide an extra set of prompts that let them roam more.
So I understand how these prompts work for tooling, etc, but they tend to be specific to specific models. Is it possible you could actually supply say 10 prompts for the same tool and determine which one gets the correct output? It wouldn't be much harder than having some test cases and running each prompt through the user selected model to see which worked.
Otherwise you're at the mercy of whatever model the user has selected or downloaded or whatever. And whenever you need to tweak it to improve something.
This would be akin to how we used to calibrate stylus or touch screens.
One thing I've had in the back of my brain for a few days is the idea of LLM-as-a-judge over a multi-armed bandit, testing out local models. Locally, if you aren't too fussy about how long things take, you can spend all the tokens you want. Running head-to-head comparisons is slow, but with a MAB you're not doing so for every request. Nine times out of ten it's the normal request cycle. You could imagine having new models get mixed in as and when they become available, able to take over if they're genuinely better, entirely behind the scenes. You don't need to manually evaluate them at that point.
I don't know how well that gels with aider's modes; it feels like you want to be able to specify a judge model but then have it control the other models itself. I don't know if that's better within aider itself (so it's got access to the added files to judge a candidate solution against, and can directly see the evaluation) or as an API layer between aider and the vllm/ollama/llama-server/whatever service, with the complication of needing to feed scores out of aider to stoke the MAB.
You could extend the idea to generating and comparing system prompts. That might be worthwhile but it feels more like tinkering at the edges.
It's funny you say this! I was adding a tool just earlier (that I haven't yet pushed) that allows the model to... switch model.
Aider can also have multiple models active at any time (the architect, editor and weak model is the standard set) and use them for different aspects. I could definitely imagine switching one model whilst leaving another active.
I think it did a fairly good job! It took just a couple of minutes and it effectively just switches the main model based on recent input, but I don’t doubt that this could become really robust if I had poked or prompted it further with preferences, ideas, beliefs and pushback! I imagine that you could very quickly get it there if you wished.
It's definitely not showing off the most here, because it's almost all direct-coding, very similar to ordinary Aider. :)
Google and Bing's Cache, Archive.org, Archive.is, CommonCrawl... many services have previously or currently presented the full document.
Google and Bing removed their cache features when LLMs started taking off – as I said in a sibling comment, I wonder if they felt that that regime was finally going to be challenged in court as people tried to protect their data.
That being said, "can't present the full document due to copyright" seems at odds with all of the above examples existing for years.
I’ve been wondering about this and searching for solutions too.
For now we’ve just managed to optimize how quickly we download pages, but haven’t found an API that actually caches them. Perhaps companies are concerned that they’ll be sued for it in the age of LLMs?
The Brave API provides ‘additional snippets’, meaning that you at least get multiple slices of the page, but it’s not quite a substitute.
> A: Yes. Zed will be free to use as a standalone editor. We will instead charge a subscription for optional features targeting teams and collaboration. See "how will you make money?".
> Q: How will you make money?
> A: We envision Zed as a free-to-use editor, supplemented by subscription-based, optional network features, such as:
- Channels and calls
- Chat
- Channel notes
We plan to offer our collaboration features to open source teams, free of charge.
It seems to me that they're just going to charge for Zeta if they do, because it... costs them money to run.
Unlike others (e.g. Cursor), they've opened it (and its fine-tuning dataset!), so you can just run it yourself if you want to bear the costs...
They did something similar with LLM use, where for simplicity they gave you LLM use, but you could use them directly too.
For example when I've wanted to understand an unfolding story better than the news, I've told it to ignore the media and go only to original sources (e.g. speech transcripts, material written by the people involved, etc.)
reply