The problem is that this is $100/mo with limits. At work I use Cursor, which is pretty good (especially tab completion), and at home I use Copilot in vscode insiders build, which is catching up to Cursor IMO.
However, as long as Microsoft is offering copilot at (presumably subsidized) $10/mo, I'm not interested in paying 10x as much and still having limits. It would have to be 10x as useful, and I doubt that.
I don’t know why people expect unlimited usage for limited cost. Copilot hasn’t been good for a long time. They had the first mover advantage but they were too slow to improve the product. It’s still not caught up to cursor or windsurf. Cline leaves it so far in the dust it’s like a decade behind in AI years. So you get what you pay for.
Claude is still the gold standard for AI assisted coding. All your Geminis and o3s of the world still don’t match up to Claude.
I started using Claude code once it became a fixed price with my Claude max subscription. And it’s taken a little getting used to vs Cline, but I think it’s closer to Cline in performance rather than cursor (Cline being my personal gold standard). $100 is something most people on this forum could make back in 1 day of work.
$100 per month for the value is nothing and for what it’s worth I have tried to hit the usage limit and the only thing that got me close was using their deep research feature. I’ve maxed out Claude code without hitting limits.
It was just obviously worse than using the anthropic website. That was the only explanation for why it was so bad. They could offer it free because it was stupid even if the same version (maybe less resources). Or maybe I was just unlucky but that's what it seemed to me.
Sonnet in Copilot is crippled, Copilot agent mode is also very basic and failed every time I tried it. It would have been amazing 2 years ago, but now it's very meh.
GitHub is losing money on the subs, but they are definitely trying to reduce the bleed. One way to do that is to cut corners with LLM usage, by not sending as much context, trimming the context window, capping output token limits, these are all things Cursor also does btw, hence why Cline, with almost the same tech (in some ways its even inferior tech) achieves better results. I have hit $20 in API usage within a single day with Cline, Cursor lets you have "unlimited" usage for $20 for a month. So its optimised for saving costs, not for giving you the best experience. At $10 per month for Copilot, they need to save costs even more. So you get a bad experience, you think its the AI that is not capable, but the problem is with the companies burning VC money to corner the market, setting unrealistic expectations on pricing, etc.
But basically you get ~300Mn input tokens and ~100Mn output tokens per month with Sonnet on the $100 plan. These are split across 50 sessions you are allowed, each session is 5 hrs starting from the first time you send a message until 5 hrs after the first message. During this time, you get ~6Mn input and ~2Mn output tokens for Sonnet. Claude Code seems to use a mix of Sonnet and Haiku, and Haiku has 2x the limits of Sonnet.
So if you absolutely maxed out your 50 sessions every month, that's $2400 worth of usage if you instead had used the API. So it's a great deal. It's not $100 worth of API credits you're buying, so they don't run out like that. You can exhaust limits for a given session, which is at most a 5 hr wait for your next one, or you can run out of 50 sessions, I don't know how strongly they enforce that limit and I think that limit is BS, but all in all the value for money is great, way better than using the API.
Thanks for the link and explainer. My first experience with Claude Code left mixed feelings because of the pricing. I have Pro subscription, but for Claude Code can only use API mode. So I added 5$ just to check it, and exhausted 4.5$ in the first 8m session. It left me wondering if switching to Max plan will exhaust it at the same rate or not.
Nope, it can be even a dozen (because agentic). Claude usage limits are actually based on token usage, and Claude Code uses a mix of Haiku and Sonnet. So your limits are split among those two models. I gave an estimation of how much usage you can expect in another comment on this thread, but you will find it hard to max out the $100 plan unless you are using it very, very extensively.
I didn’t realize they were tuning cost optimization by switching models contextually. That’s very clever. I bet the whole industry of consumer LLM apps moves that way.
Claude is still the gold standard for AI assisted coding. All your Geminis and o3s of the world still don’t match up to Claude.
Out of date I think in this fast moving space.
Sonnet has long been the gold-standard, but that position is looking very shaky at the moment; Gemini in particular has been working wonders for me and others when Sonnet has stumbled.
VS Code/Copilot has improved massively in Cursor's wake, but yes, still some way to go to catch up.
Absolutely though - the value we are getting is incredible.
In my experience, there are areas where Gemini did well but Claude didn't, same for o1 pro or o3, but for 90% of the work, I find Claude way more trustworthy, better at following instructions, not making syntax mistakes, etc. Gemini 2.5 Pro is way better than all their prior models, but I don't get the hype about it being a coding superstar. It's not bad, but Sonnet is still the primary workhorse. Sonnet is more expensive, so if Gemini was at the same level I'd be happy to save the money, but unfortunately, I've tried it with various approaches, played with the temperature, but in the vast majority of cases Claude does a better job.
Gemini 2.5 Pro is better at coding than Claude, it’s just not as good at acting agentically, nor does Google have good tooling to support this use case. Given how quickly they’ve come from far behind and their advantage on context size (Claude’s biggest weakness), this could change just as fast, although I’m skeptical they can deliver a good end user dev tool.
Id be careful with stating things like these as fact. I asked Gemini for half an hour to write code that draws a graph the way I want, it never got it right. Then I asked Cladue 3.7 and it got it almost right the first try, to the point I thought its compeltely right, and fixed the bug I discovered right after I pointed it out.
Yup, I have had similar experience too. Not only for coding, but just yesterday, I was asking Gemini to compose an email with a list of attachments, which I had specified as a list of file paths in the prompt, and it wasn't able to count correctly and report in the email text (the text went something like, there are <number_of_attachments> charts attached). Claude 3.7 was able to do that correctly in one go.
Have I got bad news for you.... Microsoft announced imposing limits on "premium" models from next week. You get 300 "free" requests a month. If you use agent, you consume about 3-4 requests per action easily, I estimate to burn through 300 in about 3-5 working days.
Basically anything that isnt gpt4o is premium, and I find gpt4o near useless compared to Claude and Gemini in copilot.
Doesn’t resonate with me because I’ve spent over $1,000 on Claude Code at this point and the return is worth it. The spend feels cheap compared to output.
In contrast - I’m not interested in using cheaper, less-than, services for my livelihood.
So just for work then or personal projects too? For work I can understand but for personal projects I haven't necessarily gotten more success out of AI than my own code, to be honest.
In terms of personal projects, I use my own custom Ruby X11 window manager, and when I moved and got the office space for an extra monitor, Claude Code wrote the basics of the multi-monitor support by itself.
It's notable to me because there are to my knowledge no other Ruby wm's (there's at least one that allows scripting with Ruby, I believe, but not the whole codebase), the X11 bindings are custom (no Xlib or XCB), and there are few great examples that fits into the structure of my wm. Yet it made it work. The code was ugly, and I haven't committed it yet as I want to clean it up (or get Claude to) but my priority was to be able to use the second monitor without spending more than a few hours on it, and starting with no idea how multi-monitor support in X11 worked.
Since then, Claude Code has added Xinerama support to my X11 bindings, and selection support to enable a systray for my pager, and written the systray implementation (which I also didn't have the faintest clue how worked, and so had Claude explain to me before starting).
I use it for work too, but for these personal projects priority has been rough working code over beauty, because I use them every day and rely on the features, and want to spend as little time as possible on them, and so the work has been very different from how I work with Claude for work projects where I'll work in much smaller chunks, polish the result etc.
Taken from 2 recent systems. 90% of my interaction is assurance, debugging, and then having claude operate within the meta context management framework. We work hard to set the path for actual coding - thus code output (even complex or highly integrated) usually ends up being fairly smooth+fast.
For most planning I use Gemini. I copy either the entire codebase (if less than ~200k tokens) or select only that parts that matter for the task in large codebases. I built a tool to help me build these prompts, keep the codebase organized well in xml structure. https://github.com/backnotprop/prompt-tower
Ah yea sorry that is an export error... I copied prompts directly out of Claude Code and when I do that it copies all of the ascii/tui parts that wrap the message... I used some random "strip special chars" site to remove those and was lazy about adding actual punctuation back in.
"Ensure all our crons publish to telegraf when they start and finish. Include the cron name and tenant id when applicable. For crons that query batch jobs, only publish and take a lock when there is work to do. look at <snip> as an example. Here is the complete list to migrate. Create a todo list and continue until done. <insert list of 40 file paths>"
I used it yesterday to convert a website from tailwind v1 to v4. Gave it the files (html/scss/js), links to tailwind and it did the job. Needed some back and forth and some manual stuff but overall it was painless.
It is not a challenging technical thing to do. I could have sat there for hours reading the conversion from v1 to v2 to v3 to v4. It is mostly just changing class names. But these changes are hard to do with %s/x/x, so you need to do them manually. One by One. For hundreds of classes. I could have as easily shot myself in the head.
> Could you anonymize and share your last 5-10 prompts?
The prompt was a simple "convert this site from tailwind v1 to v4". I use neovim copilot chat to inject context and load URLs. I have found that prompts have no value, it is either something the LLM can do or not.
i got $100 of credit at the start of the year, and have been using +1$ each month, starting at $2 in january using aider at the time. just switched to claude code this week, since it follows a similar UX. agentic CLI code assist really has been growing in usefulness for me as i get faster at reviewing its output.
i use it for very targeted operations where it saves me several roundtrips to code examples and documentation and stack overflow, not spamming it for every task i need to do, i spend about $1/day of focused feature development, and it feels like it saves me about 50% as many hours as i spend coding while using it.
What do you prefer, between Aider and CC? I use Aider for when I want to vibe code (I just give the LLM a high-level description and then don't check the output, because it's so long), and Cursor when I want to AI code (I tell the AI to do low-level stuff and check every one of the five lines it gives me).
AI coding saves me a lot of time writing high-quality code, as it takes care of the boilerplate and documentation/API lookups, while I still review every line, and vibe coding lets me quickly do small stuff I couldn't do before (e.g. write a whole app in React Native), but gets really brittle after a certain (small) codebase size.
I'm interested to hear whether Claude Code writes less brittle code, or how you use it/what your experience with it is.
I tested Aider a few times, and gave up because at the time it was so bad - it might be time to try it again, and I'll add that my experience with seeing how Claude Code works for me while lots of other people struggle with it suggests to me that my experience with Aider might well be that my style of working just meshes better with Claude Code than Aider.
Claude Code was the first assistant that gelled for me, and I use it daily. It wrote the first pass of multi-monitor support for my window manager. It's written the last several commits of my Ruby X11 bindings, including a working systray example, where it both suggested the whole approach and implemented it, and tested it with me just acting as a clicking monkey (because I haven't set up any tooling to let it interact with the GUI) when it ran test scripts.
I think you just needs to test the two side by side and see what works for you.
I intend to give Aider a go at some point again, as I would love to use an open source tool for this, but ultimately I'll use the one that produces better results for me.
Makes sense, thanks. I've used Claude Code but it goes off on its own too much, whereas Aider is more focused. If you do give Aider another shot, use the architect/editor mode, with Gemini 2.5 Pro and Claude 3.7, respectively. It's produced the best results for me.
The two worst ways of burning API credits I've found with Claude Code are:
1. Getting argumentative/frustrated with the model if it goes off the rails and continuing to try to make something work when the model isn't getting anywhere.
If it really isn't getting something in the first few prompts, stop and rethink. Can you go back and set a smaller task? Like writing test cases that it's broken approach would fail? If it's not making forward progress after a couple of prompts, it's not likely to unless you split up the task and/or provide more details. This is how you burn $10 instead of $0.60 for a task that "should" be simple. It's bad at telling you something is hard.
2. Think about when you either /compact (trims the context but retains important details) or clear the context entirely. E.g. always clear when moving to another task unless they're closely related. Letting it retain a long context is a surefire way of burn through a lot (and it also slows you down a lot, not least because there's a bug that affects some of us - maybe related to TERM settings? no idea - where in some cases it will re-print the entire history to the terminal, so between tasks it's useful to quit and restart)
Also use /init, but also ask it to update CLAUDE.md with lessons learned regularly. It's pretty good at figuring things out, such as how my custom ORM for a very unusual app server I'm working on works, but it's a massive waste of tokens to have it re-read the ORM layer every time instead of updating CLAUDE.md.
> If it really isn't getting something in the first few prompts, stop and rethink. Can you go back and set a smaller task? Like writing test cases that it's broken approach would fail?
This.
I was fighting with Claude for a good chunk of yesterday (usage limits seemed broken so it didn't really time me out) and most of that was getting it to fix one small issue with three test cases. It would fix one test and break the others, round and round we go. After it broke unrelated tests I had to back out all the changes and, by then, I understood the problem well enough so could direct it how to fix it with a little help from Deepseek.
As there are a bunch of other sections of code which suffer from the same problem I can now tell it to "look at the fixed code and do it like that" so, hopefully, it doesn't flail around in the dark as much.
Admittedly, this is fairly complicated code, being an AST to bytecode compiler with a bunch of optimizations thrown in, and part of the problem was a later optimization pass undoing the 'fixes' Claude was applying which took quite a while to figure out.
Now I just assume Claude is being intentionally daft and treat it as such with questions like "why would I possibly want a fix specifically designed to pass one test instead of a general fix for all the cases?" Oh, yeah, that's its new trick, rewriting the code to only pass the failing test and throwing everything else out because, why not?
> Now I just assume Claude is being intentionally daft and treat it as such with questions like "why would I possibly want a fix specifically designed to pass one test instead of a general fix for all the cases?" Oh, yeah, that's its new trick, rewriting the code to only pass the failing test and throwing everything else out because, why not?
The best one I've seen is when it tries to disable a test because it can't make the code pass it.
You do need to treat it as if it's trying to sneak stuff past you sometimes because you do get the occasional bout of what in a human I'd have treated as "malicious compliance", at a level well beyond stupidity.
Whoever is paying for your time should calculate how much time you’d save between the different products. The actual product price comparison isn’t as important as the impact on output quality and time taken. Could be $1000 a month and still pay for itself in a day, if it generated >$1000 extra value.
This might mean the $10/month is the best. Depends entirely on how it works for you.
(Caps obviously impact the total benefit so I agree there.)
Just today I had yet another conversation about how BigCo doesn't give a damn about cost.
Just to give you one example - last BigCo I worked for had a schematic for new projects which resulted in... 2k EUR per month cloud cost for serving a single static html file.
At one point someone up top decided that kubes is the way to go and scrambled an impromptu schematic for new projects which could be simply described as a continental class dreadnought of a kubernetes cluster on AWS.
And it was signed off, and later followed like a scripture.
Couple stories lower we're having hard time arguing for 50 EUR budget for a weekly beer for the team, but the company is A fine with paying 2K EUR for a landing page.
I've seen the books as to how much we spend on all the various AI shit. I can guarantee, that at least in our co, that AI is a massive waste of money.
But it doesn't really matter, because the C-level has been consumed by the hype like nothing I've ever seen. It could cost an arm and a leg and they'd still be pushing for it because the bubble is all-consuming, and anyone not touting AI use doesn't get funding from other similarly clueless and sucked-in VCs.
I'll add on to this: I don't really use agent modes a lot. In an existing codebase, they waste a lot of my time for mixed results. Maybe Claude Code is so much better at this that it enables a different paradigm of AI editing—but I'd need easy, cheap access to try it.
You don't need a max subscription to use Claude Code. By default it uses your API credits, and I guess I'm not a heavy AI user yet (for my hobby projects), but I haven't spent more than $5/month on Claude Code the past few months.
The problem with it is that it uses a 30k~ token system prompt (albeit "cached"), and very quickly the usage goes up to a few million. I can easily spend over $10 a day.
AI Agent should be treated like a human developer. If you bring a new human developer to your codebase and give them a task it will take a lot of time to read and understand the codebase before making proper solution. If you want to use AI agent regularly it makes sense to have some sort of memory of the codebase.
And it seems like community realizes it and invents different solutions. RooCode has task orchestration built in already, there is a claude task-manager that allows splitting and remembering tasks so AI agent can pick it up quicker, there are different solutions with files like memory bank. Windsurf cursor upgraded their .widsurf/rules functionality to allow more solutions like that for instructing AI agents about the codebase/tasks. Some people even write their own scripts that feed every file to LLM and store the summary description in the separate file that AI agent tool can use instead of searching codebase.
I'm eager to see how some of these solutions will become embedded into every AI agent solution. It's one of the missing stones to make AI agents order of magnitude more efficient and productive.
Limits are a given on any plan. It would be too easy for a vibe coder to hammer away 8 hours a day for 20 days a week if there was nothing stopping them.
The real question is whether this is a better value than pay as you go for some people.
I've often run multiple Claude Code sessions in parallel to do different tasks. Burns money like crazy, but if you can handle wrangling them all there's much less sitting and waiting for output.
> I'm not interested in paying 10x as much and still having limits. It would have to be 10x as useful
I don't think this is the right way to look at it. If CoPilot helps you earn an extra $100 a month (or saves you $100 worth of time), and this one is ~2x better, it still justifies the $100 price tag.
At 10-20$ a month that calculation is trivial to make. At a 100$ I'm honestly not getting that much value out of AI, especially not every month, and especially not compared to cheaper versions.
I always imagined that these $10/mo plans are essentially loss leaders and that in the long run, the price should be much higher. I'm not even sure if that $100/mo plan pays for its underlying costs.
I think their free tiers are by definition of loss leaders, but I think you're right, all of there offerings are loss leaders. I know I can get more from my $20 using Claude Pro than I can using their API Workbench. It is such a competitive space that I don't think its unrealistic for these companies to ever be cash positive because all cash they have needs to be spent on competing in this space.
I think this thinking is flawed. First, it presupposes a linear value/cost relationship. That is not always true - a bag that costs 100x as much is not 100x more useful.
Additionally, when you’re in a compact distribution, being 5% better might be 100x more valuable to you.
Basically, this assumes that the marginal value is associated with cost. I don’t think most things, economically, seem to match that pattern. I will sometimes pay 10x the cost for a good meal that has fewer calories (nutritional value)
I am glad people like you exist, but I don’t think the proposition you suggest makes sense.
However, as long as Microsoft is offering copilot at (presumably subsidized) $10/mo, I'm not interested in paying 10x as much and still having limits. It would have to be 10x as useful, and I doubt that.