More

ezyang · 2025-03-19T15:13:51 1742397231

But only Claude Desktop gets flat $20 pricing from Claude Pro lol

ezyang · 2025-03-19T14:52:00 1742395920

And with Claude Code, it's typically not $0.08... it's more like $0.50, $5.00 just for a roll LOL. Variable rewards gambling addiction? Definitely...

cruffle_duffle · 2025-03-19T16:42:11 1742402531

I was thinking cursor pricing. It becomes a whole different ballgame when you plug these tools into the providers API and pay by the token. Suddenly you really start evaluating how much value you are actually getting out of the tool!

ezyang · 2025-03-19T14:45:21 1742395521

If you'd done it with MCP it would only have cost you $20 and you would still have had the rest of the month to use your Claude Pro sub :P

_joel · 2025-03-19T14:49:08 1742395748

This is true! :)

ezyang · 2025-03-19T14:37:33 1742395053

You can try (self promo) https://github.com/ezyang/codemcp . https://github.com/rusiaaman/wcgw is also quite popular, although they allow unrestricted shell access (that's why it's named wcgw lol).

roger_ · 2025-03-19T15:03:13 1742396593

Thanks! Does codemcp support having the server on another machine? Maybe communicate over ssh?

ezyang · 2025-03-19T15:13:30 1742397210

Not built in, you'll have to use something like https://github.com/sparfenyuk/mcp-proxy

ezyang · 2025-03-19T14:29:08 1742394548

There's also some fundamental limitations to the Desktop MCP experience that are probably never getting fixed; Claude Code can spin off subagents and play around with the context, I assume that Claude Desktop's form factor is basically going to stay the way it is until the end of time lol.

ezyang · 2025-03-19T14:16:13 1742393773

IMO, the big problem with Aider is that it's not agentic. This is good because it means costs are down, but most of the edit-test-fix loop magic in coding agents comes from the agent loop.

ezyang · 2025-03-19T14:14:25 1742393665

There's a few coding MCPs out there. I have also written one (codemcp) and the pitch for mine is that it DOESN'T provide a bash tool by default and checkpoints your filesystem edits every change in Git, so that it's all about feeling comfortable with letting the agent run to completion and then only inspect the final result. The oldest one in the space, I think, is wcgw.

ezyang · on July 18, 2023

The llama source code in the original repo has been updated for llama 2: https://github.com/facebookresearch/llama

itake · on July 18, 2023

do you know if llama.cpp will work out of the box or do we need to wait for the code to be updated?

azeirah · on July 18, 2023

https://github.com/ggerganov/llama.cpp/issues/2262

Likely needs to be updated

Edit: Only the case for the 34B and 70B models. 7B and 13B run as-is.

You can download the GGML model already

https://huggingface.co/TheBloke/Llama-2-7B-GGML

https://huggingface.co/TheBloke/Llama-2-13B-GGML

ezyang · on Jan 12, 2023

If you want to try this in Python, you can use https://github.com/ezyang/expecttest which I wrote to do expect tests in PyTorch.

fabioz · on Jan 12, 2023

My go-to library in python for this is:

https://pypi.org/project/pytest-regressions/

It's a bit different in that it'll save the expected to a different file... IMHO that's usually nicer because the test result is usually big and having it separated makes more sense.

When rerunning it's possible to run pytest with '--force-regen' and then check the git diff to see if all the changes were expected.

ezyang · on July 6, 2021

It's especially bad for rr, since it doesn't otherwise have any reason to talk to the Internet (I see people mentioning Firefox telemetry, but you know, Firefox is a browser, you expect it to talk to the net).

The best I can think of is to incentivize it other ways; e.g., telemetry only for bug reporting, or a "you ping us, we give you a nice hat" or something.