Hacker Newsnew | past | comments | ask | show | jobs | submit | mips_avatar's commentslogin

He's roommates with an Anthropic researcher, I was roommates with a Google product manager I don't think I'm really bought out by Google.


[flagged]


I don't know GP's situation. But in the case of the linked article, given anthropic's tie to the Bay Area "rationalist" community, one possible reason why the author has a roommate is he bought in to the rationalist "group house" culture and moved in with one of them.


Product managers aren't management. They manage the trajectory of a software initiative. They can be hired straight out of college.

Rents in the San Francisco Bay area are too high to live a practical distance from job centers as a junior without roommates.


What? Plenty of people prefer to live with roommates, especially in the bay.


Bought out, bought in. Is the distinction important?


mips_avatar is describing neither.


You don't think that. But I do. Prove you aren't.


It is literally impossible to prove a negative, that’s how conspiracy thinking operates and it’s why fortunately the justice system operates on the opposite principle and requires proof of guilt.

It’s true that in some circumstances we require avoiding even the appearance of impropriety or a conflict of interest, but that’s simply too large a burden to impose on everyone all of the time, especially for allegedly dire sins like “having a roommate who works for a Google”


Have you tried any really big models on a mac studio? I'm wondering what latency is like for big qwens if there's enough memory.


Not yet with MetalRT, right now we support models up to ~4B parameters (Qwen3 4B, Llama 3.2 3B, LFM2.5 1.2B). These are optimized for the voice pipeline use case where decode speed and latency matter more then model size.

Expanding to larger models (7B, 14B, 32B) on machines with more unified memory is on the roadmap. The Mac Studio with 192GB would be an interesting target, a 32B model at 4-bit would fit comfortably and MetalRT's architectural advantages (fused kernels, minimal dispatch overhead) should scale well.

What model / use case are you thinking about? That helps us prioritize.


Well it’s just more that I’ve noticed in the agents I’ve built that qwen doesn’t get reliable until around 27b so unless you want to rl small qwen I don’t think I would get much useful help out of it.


That tracks with what we've seen too. For agent workflows with reliable tool calling, you really do need the larger models. Larger model support is a priority for us. Thanks for the data point.


I am running 80b Qwen coder next 4bit quant MLX version on a 96GB M3 MacBook and it responds quickly, almost immediately. I can fit the model + 128k context comfortably into the memory


The striking thing I heard from Meta staff is that Alexandr Wang would walk around campus with very obvious bodyguards surrounding him. Like sure maybe security is needed, but the decision to be surrounded with bouncerish guys says something about him.


But even on campus? Weird.


Yeah on campus apparently


That is just being obnoxiously self important.


It could be required by the company. Many companies require top executives to have personal security. I'd be surprised if Zuck didn't have bodyguards even within the office. He has 24/7 security outside, so why wouldn't he inside?


I think the notes about Alexander I’ve heard is just how obvious his were


Yeah, like all the tech CEOs surely have bodyguards, but they try to blend in and not be noticeable as bodyguards; sounds like these were trying to make a certain impression?


The key is that Andrej has really good taste. It takes a lot to make a great harness for these models.


nanochat is super capable, the d34 (2.2b) variant is competitive with qwens of that size. Andrej is I assume building out the improvements in preparation for bigger training runs. We desperately need a truly open model, so i think this is incredibly important.


I built a geocoder that mostly solves this https://jonready.com/blog/posts/geocoder-for-ai-agents.html. I have about 96% recall compared to google places 98% recall, but it uses an llm for query planning and ranking so it might not be a good solution for you.


I imagine you could do something like a LORA


The problem with groq was they only allowed LORA on llama 8b and 70b, and you had to have an enterprise contract it wasn't self service.


I think the thing that makes 8b sized models interesting is the ability to train unique custom domain knowledge intelligence and this is the opposite of that. Like if you could deploy any 8b sized model on it and be this fast that would be super interesting, but being stuck with llama3 8b isn't that interesting.


The "small model with unique custom domain knowledge" approach has a very low capability ceiling.

Model intelligence is, in many ways, a function of model size. A small model tuned for a given domain is still crippled by being small.

Some things don't benefit from general intelligence much. Sometimes a dumb narrow specialist really is all you need for your tasks. But building that small specialized model isn't easy or cheap.

Engineering isn't free, models tend to grow obsolete as the price/capability frontier advances, and AI specialists are less of a commodity than AI inference is. I'm inclined to bet against approaches like this on a principle.


> Engineering isn't free, models tend to grow obsolete as the price/capability frontier advances, and AI specialists are less of a commodity than AI inference is. I'm inclined to bet against approaches like this on a principle.

This does not sound like it will simplify the training and data side, unless their or subsequent models can somehow be efficiently utilized for that. However, this development may lead to (open source) hardware and distributed system compilation, EDA tooling, bus system design, etc getting more deserved attention and funding. In turn, new hardware may lead to more training and data competition instead of the current NVIDIA model training monopoly market. So I think you're correct for ~5 years.


A fine tuned 1.7B model probably is still too crippled to do anything useful. But around 8b the capabilities really start to change. I’m also extremely unemployed right now so I can provide the engineering.


Using ascii characters is a simple form of tokenization with less compression


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: