Hacker Newsnew | past | comments | ask | show | jobs | submit | acchow's commentslogin

> I think something like 96GB RTX PRO 6000 Blackwells would be the minimum to run a model of this size with performance in the range of subscription models.

GLM 5.1 has 754B parameters tho. And you still need RAM for context too. You'll want much more than 96GB ram.


Which in the series specifically?

H1b is tied to employment, not to the employer. You can change employers on the same H1.

It’s not great. But this is similar to how health insurance is tied to employment, not to the employer. Both citizens and H1 employees experience the same abuse here


No it’s worse for them. A person on an H-1B has a ticking time bomb to find a new job or leave the country.

> asserting that LLMs will never generate 'truly novel' ideas or problem solutions

I don't think I've had one of these my entire life. Truly novel ideas are exceptionally rare:

- Darwin's origin of the species - Godel's Incompleteness - Buddhist detachment

Can't think of many.


> Every MCP server injects its full tool schemas into context on every turn

I consider this a bug. I'm sure the chat clients will fix this soon enough.

Something like: on each turn, a subagent searches available MCP tools for anything relevant. Usually, nothing helpful will be found and the regular chat continues without any MCP context added.


Absoultely.

I'll add to your comment that it isn't a bug of MCP itself. MCP doesn't specify what the LLM sees. It's a bug of the MCP client.

In my toy chatbot, I implement MCP as pseudo-python for the LLM, dropping typing info, and giving the tool infos as abruptly as possible, just a line - function_name(mandatory arg1 name, mandatory arg2 name): Description

(I don't recommend doing that, it's largely obsolete, my point is simply that you feed the LLM whatever you want, MCP doesn't mandate anything. tbh it doesn't even mandate that it feeds into a LLM, hence the MCP CLIs)


Yup, routing is key. Just like how we've had RAG so we don't have to add every biz doc to the context.

I agree with the general idea that models are better trained to use popular cli tools like directory navigation etc, but outside of ls and ps etc the difference isn't really there, new clis are just as confusing to the model as new mcps.


You’re spot on. Anthropic blogs talk about a ToolSearchTool to solve this problem - https://www.anthropic.com/engineering/advanced-tool-use


> > Every MCP server injects its full tool schemas into context on every turn

> I consider this a bug. I'm sure the chat clients will fix this soon enough.

ANTHROP\C's Claudes manage/minimize/mitigate this reaonably.


That’s a trade off, now you need multiple model calls for every single request


Yes we just RAG to be applied on tools. Very simple to implement.


I don’t think so. Without a list of tools in context the ai can’t even know what options it has, so a RAG like search doesn’t feel like it would be anywhere near as accurate


The RAG helps select the tool needed for the task at hand. Semantic search returns only the tools that match. Very efficient.


But MCP uses Oauth. That is not a "worse version" of API keys. It is better.

The classic "API key" flow requires you to go to the resource site, generate a key, copy it, then paste it where you want it to go.

Oauth automates this. It's like "give me an API key" on demand.


Interesting pricing differential. Seems in your country, that IdeaPad is significantly cheaper than the price in the US. But for your Macbook Neo, it's the other way around.

Wonder why that is.


No idea. Maybe Lenovo includes purchasing power in the price calculation for some reason, such as making more money in the U.S. while gaining market share here in Czechia, where purchasing power is lower. Apple may be able to afford not to do that.


I had the opposite issue. Trackpoints stated hurting my hand because it requires significantly more force than the Mac's touchpad.


We probably shouldn't deprecate AM for emergency broadcasts given you can listen to AM radio with grass https://www.youtube.com/watch?v=b9UO9tn4MpI


Only at the point of emission however...


> Claude code always give me rate limits

> 50+ tps with qwen3.5 35b a4b on a 4090

But qwen3.5 35b is worse than even Claude Haiku 4.5. You could switch your Claude Code to use Haiku and never hit rate limits. Also gets similar 50tps.


I haven't tried 4.5 haiku much, but i was not impressed with previous haiku versions.

My goto proprietary model in copilot for general tasks is gemini 3 flash which is priced the same as haiku.

The qwen model is in my experience close to gemini 3 flash, but gemini flash is still better.

Maybe it's somewhat related to what we're using them for. In my case I'm mostly using llms to code Lua. One case is a typed luajit language and the other is a 3d luajit framework written entirely in luajit.

I forgot exactly how many tps i get with qwen, but with glm 4.7 flash which is really good (to be local) gets me 120tps and a 120k context.

Don't get me wrong, proprietary models are superior, but local models are getting really good AND useful for a lot of real work.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: