More

ezyang · 2025-03-19T22:45:00 1742424300

I have used Cursor and my own MCP codemcp. Cursor has a lot of nice QoL that you can't get from an MCP package; the TAB is really good for traditional coding. Haven't used copilot so I don't have a comparison there. Definitely use agent mode.

ezyang · 2025-03-19T22:26:43 1742423203

When I posted this to Reddit there was a pretty lively discussion there: https://www.reddit.com/r/ChatGPTCoding/s/wRmnREUWzn

ezyang · 2025-03-19T22:23:06 1742422986

Cline has famously huge API token costs as it is profligate with context. Because codemcp plugs into Claude Desktop you only pay for your Claude Pro sub, similar to Cursor's pricing model

_jayhack_ · 2025-03-20T01:29:08 1742434148

This is true and is essentially a form of arbitrage. Anthropic is eating the cost of your elevated queries with their $20 flat fee subscription.

The "famously huge API token costs" you are referring to is Cline passing the Anthropic API cost through to you with no markup. You even input your own API token.

ezyang · 2025-03-19T22:13:05 1742422385

In my experience, the model is king, and codemcp will operate fairly similarly to Cursor with the same failure modes as Cursor due to Sonnet 3.7. One thing that I like about codemcp is I can customize aspects of the interaction as I discover new things I want to do :)

ezyang · 2025-03-19T20:19:39 1742415579

Yeah, Sonnet 3.5/3.7 are doing heavy lifting. Maybe the SOTA Gemini models would do better, I haven't tried them. Generating correct patches is a funny minigame that isn't really solved, despite how easy it is to RL on.

diggan · 2025-03-19T22:39:40 1742423980

> Maybe the SOTA Gemini models would do better, I haven't tried them

As I had to upgrade my Google Drive storage like a month ago, I gave them all a try. Short version: If you have paid plan with OpenAI/Claude already, none of them come even close, for coding at least. I thought I was trying the wrong models at first, but after confirming it seems like Google is just really far behind.

woah · 2025-03-20T03:20:47 1742440847

Strange to read this and the parent comment, since Cursor has never made a single error applying patches for me. The closest it's come is when the coding model adds unnecessary changes which of course is a completely different thing.

logicchains · 2025-03-20T06:56:18 1742453778

Which model are you using with Cursor?

woah · 2025-03-20T17:35:52 1742492152

Usually Claude 3.5, but I believe they have a separate application model which puts the code that the bigger model suggests into the file

logicchains · 2025-03-19T21:20:42 1742419242

o3-mini works well enough for me, it makes mistakes but generally it can always fix them eventually. Interestingly I found even if I include the line numbers as comments in the code it sees, it still often gets the line numbers wrong for edits (most often, off by one errors, likely due to it mixing up whether the line numbers are inclusive or exclusive). What does work a bit better is asking it to provide regex matching the first and last line of what it wants to replace, along with nearby line numbers (so if there are multiple matches in that file for the regex, it gets the right one).

ezyang · 2025-03-19T20:03:29 1742414609

In fact, the test framework I was using at the time (jest) did in fact support this. But the person who had originally written the tests hadn't had the foresight to use snapshot tests for this failing test!

diggan · 2025-03-19T22:34:45 1742423685

I don't know if your message is a continuation of the sarcasm (I feel like maybe no?), but I'm pretty sure parent's joke is that if you just change the expected values whenever the code changes, you aren't really effectively "testing" anything as much as "recording" outputs.

ezyang · 2025-03-19T18:39:46 1742409586

I'm surprised you managed to get Desktop running on Linux lol. You don't need the filesystem/git MCPs alongside codemcp; in fact, it's better not to have them so that Claude consistently uses' codemcp's equivalents to do edits. I'm not sure why codemcp's built-in git support did not work; you can probably find out more by looking in .codemcp/codemcp.log.

If you need to start a new chat, it works just fine. Tell Claude what's happened so far and what you want it to do. You can also ask Claude to summarize the old conversation, that's how /compact in claude code works too.

ezyang · 2025-03-19T18:34:11 1742409251

Hi Hacker News! One of the things about this blog that has gotten a bit unwieldy as I've added more entries is that it's a sort of undifferentiated pile of posts. I want some sort of organization system but I haven't found one that's good. Very open to suggestions!

joshka · 2025-03-19T20:19:35 1742415575

What about adding a bit more structure and investing in a pattern language approach like what you might find in a book by Fowler or a site like https://refactoring.guru/. You're much of the way there with the naming and content, but could refactor the content a bit better into headings (Problem, Symptoms, Examples, Mitigation, Related, etc.)

You could even pretty easily use an LLM to do most of the work for you in fixing it up.

Add a short 1-2 sentence summary[1] to each item and render that on the index page.

[1]: https://gohugo.io/content-management/summaries/

datadrivenangel · 2025-03-19T18:47:25 1742410045

Maybe organize them more clearly split between observed pitfalls/blindspots and prescriptions. Some of the articles (Use automatic formatting) are Practice forward, while others are pitfall forward. I like how many of the articles have examples!

smusamashah · 2025-03-19T19:16:59 1742411819

How about listing all if these on 1 single page? Will be easy to navigate/find.

ezyang · 2025-03-19T19:29:33 1742412573

They are listed on one page right now! Haha

elicash · 2025-03-19T19:42:25 1742413345

They're indexed on one page, but you can't scan/scroll through these short posts without clicking because the content itself isn't all on a single page, at least not that I can find.

(I also like the other idea of separating out pitfalls vs. prescriptions.)

lelandfe · 2025-03-19T20:40:11 1742416811

Wordpress’s approach to this is giving each post a short description in addition to the main content. The excerpt gets displayed on the main list, which helps both to grok the post and keep the list from becoming unwieldy.

smusamashah · 2025-03-20T10:37:03 1742467023

As in, all content on one page where the link just takes you to appropriate heading on the same page. These days you can do a lot on a single html.

rav · 2025-03-19T19:52:56 1742413976

My suggestion: Change the color of visited links! Adding a "visited" color for links will make it easier for visitors to see which posts they have already read.

cookie_monsta · 2025-03-19T19:43:50 1742413430

Some sort of navigation would be nice a prev/next or some way to avoid having to go back to the links page all the time.

All of the pages that I visited were small enough that you could probably wrap them them <details> tags[1] and avoid navigation altogether

[1] https://developer.mozilla.org/en-US/docs/Web/HTML/Element/de...

incognito124 · 2025-03-19T23:27:04 1742426824

There was a blog posted here which had a slider for scoring different features (popularity, personal choice, etc). The rankings updated live with slider moves.

Also, take a look at https://news.ycombinator.com/item?id=40774277

Sxubas · 2025-03-20T00:22:03 1742430123

To be honest, current format worked perfectly for me: I ended up reading all entries without feeling something was off in how they were organized. I really really liked that each section had a concrete example, please don't remove that for future entries.

Thank you for sharing your insights! Very generous.

mncharity · 2025-03-19T23:15:22 1742426122

In "Keep Files Small", there seems a lacuna: "for example, on Cursor 0.45.17, applying 55 edits on a 64KB file takes)."

sfink · 2025-03-19T20:10:11 1742415011

When I saw the title, I knew what this was going to be. It made me want to immediately write a corresponding "Human Blindspots" blog post to counteract it, because I knew it was going to be the usual drivel about how the LLMs understand <X> but sometimes they don't quite manage to get the reasoning right, but not to worry because you can nudge them and their logical brains will then figure it out and do the right thing. They'll stop hallucinating and start functioning properly, and if they don't, just wait for the next generation and everything will be fine.

I was wrong. This is great! I really appreciate how you not only describe the problems, but also describe why they happen using terminology that shows you understand how these things work (rather than the usual crap that is based on how people imagine them to work or want them to work). Also, the examples are excellent.

It would be a bunch of work, but the organization I would like to see (alongside the current, not replacing it, because the one-page list works for me already) would require sketching out some kind of taxonomy of topics. Categories of ways that Sonnet gets things wrong, and perhaps categories of things that humans would like them to do (eg types of tasks, or skill/sophistication levels of users, or starting vs fixing vs summarizing/reviewing vs teaching, or whatever). But I haven't read through all of the posts yet, so I don't have a good sense for how applicable these categorizations might be.

I personally don't have nearly enough experience using LLMs to be able to write it up myself. So far, I haven't found LLMs very useful for the type of code I write (except when I'm playing with learning Rust; they're pretty good for that). I know I need to try them out more to really get a feel for their capabilities, but your writeups are the first I've found that I feel I can learn from without having to experience it all for myself first.

(Sorry if this sounds like spam. Too gushing with the praise? Are you bracing yourself for some sketchy URL to a gambling site?)

jonas21 · 2025-03-20T03:42:04 1742442124

Maybe you should ask Claude.

ezyang · 2025-03-19T15:21:07 1742397667

Most of the problem is installing the MCP servers, which is more annoying on Windows. https://github.com/ezyang/codemcp#getting-started has instructions that I've personally tested for installation on Windows, which might help you out some.

ezyang · 2025-03-19T15:15:16 1742397316

I don't really recommend using filesystem MCP directly, it won't checkpoint changes so it's easy to end up in a state where you can't recover an older working version of the code. Use an actual coding oriented MCP.

rahimnathwani · 2025-03-19T16:05:23 1742400323

I'm going to try out yours (https://github.com/ezyang/codemcp). I already pay for Cursor, but I'm curious.

EDIT: I really like the way that each change generates a commit, and that all commits in a single session are squashed into one commit, whilst preserving the hash of each individual change.

EDIT2: I also like you only have one function (each command being a 'subtool'), which means Claude Desktop asks for permission only once per session.

dang · 2025-03-19T20:52:31 1742417551

Related ongoing thread:

Show HN: Codemcp – Claude Code for Claude Pro subscribers – ditch API bills - https://news.ycombinator.com/item?id=43356016 - March 2025 (3 comments)

ezyang · 2025-03-19T17:17:20 1742404640

And if you don't like clicking, even once, check out my other project, Refined Claude :)

rahimnathwani · 2025-03-19T17:35:20 1742405720

Wow. You should go on baby leave more often!