More

tekacs · 2025-04-15T08:00:45 1744704045

When I've wanted it to not do things like this, I've had good luck directing it to... not look at those sources.

For example when I've wanted to understand an unfolding story better than the news, I've told it to ignore the media and go only to original sources (e.g. speech transcripts, material written by the people involved, etc.)

namaria · 2025-04-15T08:11:08 1744704668

Deep Search is pretty good for current news stories. I've had it analyze some legal developments in a European nation recently and it gave me a great overview.

iamacyborg · 2025-04-15T11:05:27 1744715127

That use case seems pretty self defeating when a good news source will usually try to at least validate first-party materials which an llm cannot do.

tekacs · 2025-04-14T08:51:39 1744620699

As the sibling commenter points out, just for some targets.

That said, I have done my best to arrange the prompt order so that caching takes effect, building on what Aider already does!

tekacs · 2025-04-13T19:12:46 1744571566

As I mentioned it doesn’t, because someone has a PR out for that: https://github.com/Aider-AI/aider/pull/3672

That said yes, it’d be a very quick addition.

tekacs · 2025-04-13T17:32:39 1744565559

So... I know that people frame these sorts of things as if it's some kind of quantization conspiracy, but as someone who started using Claude Code the _moment_ that it came out, it felt particularly strong. Then, it feels like they... tweaked something, whether in CC or Sonnet 3.7 and it went a little downhill. It's still very impressive, but something was lost.

I've found Gemini 2.5 Pro to be extremely impressive and much more able to run in an extended fashion by itself, although I've found very high variability in how well 'agent mode' works between different editors. Cursor has been very very weak in this regard for me, with Windsurf working a little better. Claude Code is excellent, but at the moment does feel let down by the model.

I've been using Aider with Gemini 2.5 Pro and found that it's very much able to 'just go' by itself. I shipped a mode for Aider that lets it do so (sibling comment here) and I've had it do some huge things that run for an hour or more, but assuredly it does get stuck and act stupidly on other tasks as well.

My point, more than anything, is that... I'd try different editors and different (stronger) models and see - and that small tweaks to prompt and tooling are making a big difference to these tools' effectiveness right now. Also, different models seem to excel at different problems, so switching models is often a good choice.

sdesol · 2025-04-13T20:16:02 1744575362

> I've had it do some huge things that run for an hour or more,

Can you clarify this? If I am reading this right, you let the llm think/generate output for an hour? This seems bonkers to me.

barrell · 2025-04-13T19:53:53 1744574033

Eh I am happy waiting many years before any of that. If it only work right with the right model for the right job, and it’s very fuzzy which models work for which tasks, and the models change all the time (often times silently)… at some point it’s just easier to do the easy task I’m trying to offload then juggle all off this.

If and when I go about trying these tools in the future, I’ll probably looks for and open source TUI, so keep up the great work on aider!

tekacs · 2025-04-13T16:20:55 1744561255

Over the last two days, I've built out support for autonomy in Aider (a lot like Claude Code) that hybridizes with the rest of the app:

https://github.com/Aider-AI/aider/pull/3781

Edit: In case anyone wants to try it, I uploaded it to PyPI as `navigator-mode`, until (and if!) the PR is accepted. By I, I mean that it uploaded itself. You can see the session where it did that here: https://asciinema.org/a/9JtT7DKIRrtpylhUts0lr3EfY

Edit 2: And as a Show HN, too: https://news.ycombinator.com/item?id=43674180

and, because Aider's already an amazing platform without the autonomy, it's very easy to use the rest of Aider's options, like using `/ask` first, using `/code` or `/architect` for specific tasks [1], but if you start in `/navigator` mode (which I built, here), you can just... ask for a particular task to be done and... wait and it'll often 'just get done'.

It's... decidedly expensive to run an LLM this way right now (Gemini 2.5 Pro is your best bet), but if it's $N today, I don't doubt that it'll be $0.N by next year.

I don't mean to speak in meaningless hype, but I think that a lot of folks who are speaking to LLMs' 'inability' to do things are also spending relatively cautiously on them, when tomorrow's capabilities are often here, just pricey.

I'm definitely still intervening as it goes (as in the Devin demos, say), but I'm also having LLMs relatively autonomously build out large swathes of functionality, the kind that I would put off or avoid without them. I wouldn't call it a programmer-replacement any time soon (it feels far from that), but I'm solo finishing architectures now that I know how to build, but where delegating them to a team of senior devs would've resulted in chaos.

[1]: also for anyone who hasn't tried it and doesn't like TUI, do note that Aider has a web mode and a 'watch mode', where you can use your normal editor and if you leave a comment like '# make this darker ai!', Aider will step in and apply the change. This is even fancier with navigator/autonomy.

nico · 2025-04-13T16:28:02 1744561682

> It's... decidedly expensive to run an LLM this way right now

Does it work ok with local models? Something like the quantized deepseeks, gemma3 or llamas?

tekacs · 2025-04-13T16:30:27 1744561827

It does for me, yes -- models seem to be pretty capable of adhering to the tool call format, which is really all that they 'need' in order to do a good job.

I'm still tweaking the prompts (and I've introduced a new, tool-call based edit format as a primary replacement to Aider's usual SEARCH/REPLACE, which is both easier and harder for LLMs to use - but it allows them to better express e.g. 'change the name of this function').

So... if you have any trouble with it, I would adjust the prompts (in `navigator_prompts.py` and `navigator_legacy_prompts.py` for non-tool-based editing). In particular when I adopted more 'terseness and proactively stop' prompting, weaker LLMs started stopping prematurely more often. It's helpful for powerful thinking models (like Sonnet and Gemini 2.5 Pro), but for smaller models I might need to provide an extra set of prompts that let them roam more.

cyanydeez · 2025-04-15T10:59:27 1744714767

So I understand how these prompts work for tooling, etc, but they tend to be specific to specific models. Is it possible you could actually supply say 10 prompts for the same tool and determine which one gets the correct output? It wouldn't be much harder than having some test cases and running each prompt through the user selected model to see which worked.

Otherwise you're at the mercy of whatever model the user has selected or downloaded or whatever. And whenever you need to tweak it to improve something.

This would be akin to how we used to calibrate stylus or touch screens.

regularfry · 2025-04-13T17:51:20 1744566680

Since you've got the aider hack session going...

One thing I've had in the back of my brain for a few days is the idea of LLM-as-a-judge over a multi-armed bandit, testing out local models. Locally, if you aren't too fussy about how long things take, you can spend all the tokens you want. Running head-to-head comparisons is slow, but with a MAB you're not doing so for every request. Nine times out of ten it's the normal request cycle. You could imagine having new models get mixed in as and when they become available, able to take over if they're genuinely better, entirely behind the scenes. You don't need to manually evaluate them at that point.

I don't know how well that gels with aider's modes; it feels like you want to be able to specify a judge model but then have it control the other models itself. I don't know if that's better within aider itself (so it's got access to the added files to judge a candidate solution against, and can directly see the evaluation) or as an API layer between aider and the vllm/ollama/llama-server/whatever service, with the complication of needing to feed scores out of aider to stoke the MAB.

You could extend the idea to generating and comparing system prompts. That might be worthwhile but it feels more like tinkering at the edges.

Does any of that sound feasible?

tekacs · 2025-04-13T19:06:53 1744571213

It's funny you say this! I was adding a tool just earlier (that I haven't yet pushed) that allows the model to... switch model.

Aider can also have multiple models active at any time (the architect, editor and weak model is the standard set) and use them for different aspects. I could definitely imagine switching one model whilst leaving another active.

So yes, this definitely seems feasible.

Aider had a fairly coherent answer to this question, I think: https://gist.github.com/tekacs/75a0e3604bc10ea88f9df9a909b5d...

This was navigator mode + Gemini 2.5 Pro's attempt at implementing it, based only on pasting in your comment:

https://asciinema.org/a/EKhno9vQlqk9VkYizIxsY8mIr

https://github.com/tekacs/aider/commit/6b8b76375a9b43f9db785...

I think it did a fairly good job! It took just a couple of minutes and it effectively just switches the main model based on recent input, but I don’t doubt that this could become really robust if I had poked or prompted it further with preferences, ideas, beliefs and pushback! I imagine that you could very quickly get it there if you wished.

It's definitely not showing off the most here, because it's almost all direct-coding, very similar to ordinary Aider. :)

gandalfgeek · 2025-04-13T17:59:44 1744567184

Very cool. Even cooler to see it upload itself!!

tekacs · 2025-04-03T02:42:05 1743648125

I really enjoyed and have recommended to others this very short paper, 'Borges and AI' [1], that was also discussed on HN a couple years back [2].

[1]: https://arxiv.org/abs/2310.01425

[2]: https://news.ycombinator.com/item?id=38693120

tekacs · 2025-03-15T13:10:46 1742044246

This is very interesting, but it's so perfect that the Mayo Clinic gets to use an algorithm called CURE, of all things.

shermantanktop · 2025-03-15T16:24:47 1742055887

When they describe CURE, it sounds like vanilla clustering using k-means.

tekacs · 2025-02-26T00:36:28 1740530188

Google and Bing's Cache, Archive.org, Archive.is, CommonCrawl... many services have previously or currently presented the full document.

Google and Bing removed their cache features when LLMs started taking off – as I said in a sibling comment, I wonder if they felt that that regime was finally going to be challenged in court as people tried to protect their data.

That being said, "can't present the full document due to copyright" seems at odds with all of the above examples existing for years.

tekacs · 2025-02-25T18:07:58 1740506878

I’ve been wondering about this and searching for solutions too.

For now we’ve just managed to optimize how quickly we download pages, but haven’t found an API that actually caches them. Perhaps companies are concerned that they’ll be sued for it in the age of LLMs?

The Brave API provides ‘additional snippets’, meaning that you at least get multiple slices of the page, but it’s not quite a substitute.

tekacs · 2025-02-14T19:44:42 1739562282

From their overall FAQ:

> Q: Will Zed be free?

> A: Yes. Zed will be free to use as a standalone editor. We will instead charge a subscription for optional features targeting teams and collaboration. See "how will you make money?".

> Q: How will you make money?

> A: We envision Zed as a free-to-use editor, supplemented by subscription-based, optional network features, such as:

  - Channels and calls
  - Chat
  - Channel notes

  We plan to offer our collaboration features to open source teams, free of charge.

It seems to me that they're just going to charge for Zeta if they do, because it... costs them money to run.

Unlike others (e.g. Cursor), they've opened it (and its fine-tuning dataset!), so you can just run it yourself if you want to bear the costs...

They did something similar with LLM use, where for simplicity they gave you LLM use, but you could use them directly too.

tuananh · 2025-02-14T20:37:11 1739565431

for Cursor, if you use OpenAI api key for example, it's kinda cripple because the tab edit model is also proprietary.

jacooper · 2025-02-16T01:36:20 1739669780

And the usage limits still apply anyway.