Hacker Newsnew | past | comments | ask | show | jobs | submit | jckahn's commentslogin

> I know I'm tired of reading them, but don't people get bored of writing them?

Look, it's either this or a dozen articles a day about Claude Code.


I've come accept that producing code I'm truly proud of is now my hobby, not my career. The time it takes to write Good Code is unjustifiable in a business context and I can't make the case for it outside of personal projects.

Yeah I don't understand why everyone seems to have forgotten about the Gemini options. Antigravity, Jules, and Gemini CLI are as good as the alternatives but are way more cost effective. I want for nothing with my $20/mo Google AI plan.


Yeah I'm on the $20/mo Google plan and have been rate limited maybe twice in 2 months. Tried the equivalent Claude plan for a similar workload and lasted maybe 40 minutes before it asked me to upgrade to Max to continue.


> Yeah I'm on the $20/mo Google plan and have been rate limited maybe twice in 2 months. Tried the equivalent Claude plan for a similar workload and lasted maybe 40 minutes before it asked me to upgrade to Max to continue.

The TLDR: The $20/40m cost is more reflective of what inference actually costs, including the amortised cost of the Capex, together with the Opex.

The Long Read:

I think the reason is because Anthropic is attempting to run inference at a profit and Google isn't.

Another reason could be that they don't own their cost centers (GPUs are from Nvidia, Cloud instances are from AWS, data centers from AWS, etc); they own only the model but rent everything else needed for inference so pay a margin for all those rented cost centers.

Google owns their entire vertical (GPUs are google-made, Cloud instances and datacenters are Google-owned, etc) and can apply vertical cost optimisations, so their final cost of inference is going to be much cheaper anyway even if they were not subsidising inference with their profits from unrelated business units.


Well said.

It's for exactly this reason that I believe Google will win the AI race.


It's crazy that we're having such different experiences. I purchased the Google AI plan as an alternative to my ChatGPT (Codex) daily driver. I use Gemini a fair amount at work, so I thought it would be a good choice to use personally. I used it a few times but ran into limits the first few projects I worked on. As a result I switched to Claude and so, far, I haven't hit any limits.


Google has uncertain privacy settings, there is no declaration they won't train their LLM on your personal/commercial code.


https://macaron.im/blog/ai-assistant-privacy-comparison#:~:t...

All providers are opt-out. The moat is the data, don't pretend like you don't know.


per my previous research there is no opt out for gemini cli.


Just goes to show that attention is all you need.


A statement which goes to show that confusing correlation with causation is all you need.


> One of the things that makes Clawdbot great is the allow all permissions to do anything.

Is this materially different than giving all files on your system 777 permissions?


It's vastly different.

It's more (exactly?) like pulling a .sh file hosted on someone else's website and running it as root, except the contents of the file are generated by a LLM, no one reads them, and the owner of the website can change them without your knowledge.


> Is this materially different than giving all files on your system 777 permissions?

Yes, because I can't read or modify your files over the internet just because you chmod'ed them to 777. But with Clawdbot, I can!


From what I've read, OpenClaw only truly works well with Opus 4.5.


The latest Kimi model is comparable in performance at least for these sorts of use cases, but yes it is harder to use locally.


> harder to use locally

Which means most people must be using OpenClaw connected to Claude or ChatGPT.


It's like the ice bucket challenge but with rusty nails



Assuming this is a serious comment, what do you propose instead if the health system is shut down?


A system where no one's data is held electronically.


For better or worse it's simply no longer possible to operate a healthcare provider organization using paper records while maintaining compliance with federal interoperability and reporting mandates. That time has passed.

https://www.cms.gov/priorities/burden-reduction/overview/int...


Alternatively, just use a local model with zero restrictions.


The next best thing is to use the leading open source/open weights models for free or for pennies on OpenRouter [1] or Huggingface [2].

An article about the best open weight models, including Qwen and Kimi K2 [3].

[1]: https://openrouter.ai/models

[2]: https://huggingface.co

[3]: https://simonwillison.net/2025/Jul/30/


This is currently negative expected value over the lifetime of any hardware you can buy today at a reasonable price, which is basically a monster Mac - or several - until Apple folds and rises the price due to RAM shortages.


This requires hardware in the tens of thousands of dollars (if we want the tokens spit out at a reasonable pace).

Maybe in 3-5 years this will work on consumer hardware at speed, but not in the immediate term.


$2000 will get you 30~50 tokens/s on perfectly usable quantization levels (Q4-Q5), taken from any one among the top 5 best open weights MoE models. That's not half bad and will only get better!


If you are running lightweight models like deepseek 32B. But anything more and it’ll drop. Also, costs have risen a lot in the last month for RAM and AI adjacent hardware. It’s definitely not 2k for the rig needed for 50 tokens a second


Could you explain how? I can't seem to figure it out.

DeepSeek-V3.2-Exp has 37B active parameters, GLM-4.7 and Kimi K2 have 32B active parameters.

Lets say we are dealing with Q4_K_S quantization for roughly half the size, we still need to move 16 GB 30 times per second, which requires a memory bandwidth of 480 GB/s, or maybe half that if speculative decoding works really well.

Anything GPU-based won't work for that speed, because PCIe 5 provides only 64 GB/s and $2000 can not afford enough VRAM (~256GB) for a full model.

That leaves CPU-based systems with high memory bandwidth. DDR5 would work (somewhere around 300 GB/s with 8x 4800MHz modules), but that would cost about twice as much for just the RAM alone, disregarding the rest of the system.

Can you get enough memory bandwidth out of DDR4 somehow?


That doesn't sound realistic to me. What is your breakdown on the hardware and the "top 5 best models" for this calculation?


Look up AMD's Strix Halo mini-PC such as GMKtec's EVO-X2. I got the one with 128GB of unified RAM (~100GB VRAM) last year for 1900€ excl. VAT; it runs like a beast especially for SOTA/near-SOTA MoE models.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: