Hacker Newsnew | past | comments | ask | show | jobs | submit | smcnally's commentslogin

Beyond solid benchmarks, Alibaba's power move was dropping a bunch of models available to use and run locally today. That's disruptive already and the slew of fine tunes to come will be good for all users and builders.

https://huggingface.co/collections/Qwen/qwen3-67dd247413f0e2...


> Beyond solid benchmarks, Alibaba's power move was dropping a bunch of models available to use and run locally today.

I agree, the advantage of qwen3's family is a plethora of sizes and architectures to chose from. Another one is ease of fine-tuning for downstream tasks.

On the other hand, I'd say it's "in spite" of their benchmarks, because there's obviously something wrong with either the published results, or the way they measure them, or something. Early impressions do not support those benchmarks at all. At one point they even had a 4b model be better than their prev gen 72b model, which was pretty solid on its own. Take benchmarks with a huge boulder of salt.

Something is messing with recent benchmarks, and I don't know exactly what but I have a feeling that distilling + RL + something in their pipelines is making benchmark data creep into the models, either by reward hacking, or other signals getting leaked (i.e. prev gen models optimised for one benchmark are "distilling" those signals into newer smaller models. No, a 4b model is absoulutely not gonna be better than 4o/sonnet3.7, whatever the benchmarks say).


What's the minimum GPU/NPU hardware and memory to run Qwen3 locally?


There is a 0.6B model so basically nothing.

And the MoE 30B one has a decent shot at running OK without GPU. I'm on a 5800x3d so two generations old and its still very usable


I'm running 4B on my 8GB AMD 7600 via ollama


`model.safetensors` for Qwen3-0.6B is a single 1.5GB file.

Qwen3-235B-A22B has 118 `.safetensors` files at 4GB each.

There are a bunch of models and quants between those.


Does it run in 8x80G? Or does the KV cache and other buffers push it over the edge?


Qwen3 is a family of models, the very smallest are only a few GB and will run comfortably on virtually any computer of the last 10 years or recent-ish smart phone. The largest - well, depends how fast you want it to run.


There are models down to 0.6B and you can even run Qwen3 30B-A3B reasonably fast on CPU only.


The ROT13 cipher for API is NVK. NVidia Knows


> The DESI collaboration is honored to be permitted to conduct scientific research on I’oligam Du’ag (Kitt Peak), a mountain with particular significance to the Tohono O’odham Nation.

Anyone here know how a request like this was made or the permission given? I haven’t seen this previously.


It is on a large reservation in Arizona, the tribal government would have given them permission. They have a website.

https://www.tonation-nsn.gov/tribal-government/


Unless it is in a reservation they probably don't need permission


A recent financial report on the media industry noted The Onion is on the verge of collapse due to, quote, “not being able to able to make sh*t up that is more idiotic than current reality.”


darktable does all of this. It’s a complex application like Aperture or Light Table. You run it on your own macos, Windows or Linux computer. You can write your own software to extend or change it. Photos.app does most of this sans the Windows, Linux or “write your own” parts.


“Your Honor, why should I bother obeying a law my elected representatives could not be bothered to write?” seems like it should be a reasonable defense, but you’re right that the onus is on us as The Governed to know and understand every bit of slop and hallucination on the books.


> LLMs also love to double down on solutions that don't work.

“Often wrong but never in doubt” is not proprietary to LLMs. It’s off-putting and we want them to be correct and to have humility when they’re wrong. But we should remember LLMs are trained on work created by people, and many of those people have built successful careers being exceedingly confident in solutions that don’t work.


The issue is LLMs never say:

"I don't know how to do this".

When it comes to programming. Tell me you don't know so I can do something else. I ended up just refactoring my UX to work around it. In this case it's a personal prototype so it's not a big deal.


That is definitely an issue with many LLMs. I've had limited success including instructions like "Don't invent facts" in the system prompt and more success saying "that was not correct. Please answer again and check to ensure your code works before giving it to me" within the context of chats. More success still comes from requesting second opinions from a different model -- e.g. asking Claude's opinion of Qwen's solution.

To the other point, not admitting to gaps in knowledge or experience is also something that people do all the time. "I copied & pasted that from the top answer in Stack Overflow so it must be correct!" is a direct analog.


So now you have an overconfident human using an overconfident tool, both of which will end up coding themselves into a corner? Compilers at least, for the most part, offer very definitive feedback that act as guard rails to those overconfident humans.

Also, let's not forget LLMs are a product of the internet and anonymity. Human interaction on the internet is significantly different from in person interaction, where typically people are more humble and less overconfident. If someone at my office acted like some overconfident SO/reddit/HN users I would probably avoid them like the plague.


A compiler in the mix is very helpful. That and other sanity checks wielded by a skilled engineer doing code reviews can provide valuable feedback to other developers and to LLMs. The knowledgeable human in the loop makes the coding process and final products so much better. Two LLMs with tool usage capabilities reviewing the code isn't as good today but is available today.

The LLMs overconfidence is based on it spitting out the most-probable tokens based on its training data and your prompt. When LLMs learn real hubris from actual anonymous internet jackholes, we will have made significant progress toward AGI.


> I'm not sure this is grounded in reality. We've already seen articles related to how OpenAI is behind schedule with GPT-5.

Progress by Google, meta, Microsoft, Qwen and Deepseek is unhampered by OpenAI’s schedule. Their latest — including Gemini 2.0, Llama 3.3, Phi 4 — and the coding fine tunes that follow are all pretty good.


> unhampered by OpenAI’s schedule

Sure, but if the advancements are to catch up to OpenAI, then major improvements by other vendors are nice and all, but I don't believe that was what the commenter was implying. Right now the leaders in my opinion are OpenAI and Anthropic and unless they are making major improvements every few months, the industry as a whole is not making major improvements.


OpenAI and Anthropic are definitely among the leaders. Playing catch-up to these leaders' mind-share and technology is some of the motivation for others. Calling the progress being made in the space by Google (Gemini), MSFT (Phi), Meta (llama), Alibaba (Qwen) "nice and all" is a position you might be pleasantly surprised to reconsider if this technology interests you. And don't sleep on Apple and AMZ -

In the space covered by Tabby, Copilot, aider, Continue and others, capabilities continue to improve considerably month-over-month.

In the segments of the industry I care most about, I agree 100% with what the commenter said w/r/t expecting major improvements every few months. Pay even passing attention to huggingface and github and see work being done by indies as well as corporate behemoths happening at breakneck pace. Some work is pushing the SOTA. Some is making the SOTA more widely available. Lots of it is different approaches to solving similar challenges. Most of it benefits consumers and creators looking use and learn from all of this.


A deeper version of the same idea is to ask a second model to check the first model’s answers. aider’s “architect” is an automated version of this approach.

https://aider.chat/docs/usage/modes.html#architect-mode-and-...


for x86 or PowerPC?


Talking about rabbit-holes. I used to have prototype OS/2 PowerPC 64-bit hardware from IBM before they killed the project. I should have kept that early EFI-based system. When the EFI boot sequence would panic, you would get an error message of "Danger Will Robinson".


man OS/2 Warp on PowerPC should be really secure because no one is writing malware for that combination!


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: