> when we could have had an open world with open models run locally instead where you got to keep your private health information private
But we can have that? If you have powerful enough hardware you can do it, right now. At very least until the anti-AI people get their way and either make the models' creators liable for what the models say or get rid of the "training is fair use" thing everyone depends on, in which case, sure, you'll have to kiss legal open-weight models goodbye.
How is that surprising? The advent of modern AI tools has resulted in most people being heavily pro-IP. Everyone now talks about who has the copyright to something and so on.
Yes, people are now very pro-IP because it's the big corporations that are pirating stuff and harvesting data en-masse to train their models, and not just some random teenagers in their basements grabbing an mp3 off LimeWire. So now the IP laws, instead of being draconian, are suddenly not adequate.
But what is frustrating to me is that the second order effects of making the law more restrictive will be doing us all a big disfavor. It will not stop this technology, but it will just make it more inaccessible to normal people and put more power into the hands of the big corporations which the "they're stealing our data!" people would like to stop.
Right now I (a random nobody) can go on HuggingFace, download model which is more powerful that anything that was available 6 months ago, and run it locally on my machine, unrestricted and private.
Can we agree that's, in general, a good thing?
So now if you make the model creators liable for misuse of the models, or make the models a derivative work of its training data, or anything along these lines - what do you think will happen? Yep. The model on HuggingFace is gone, and now the only thing you'll have access to is a paywalled, heavily filtered and censored version of it provided by a megacorporation, while the megacorporation itself has internally an unlimited, unfiltered access to that model.
The models come from overt piracy, and are often used to make fake news, slander people, or other illegal content. Sure it can be funny, but the poison fruit from a poison tree is always going to be overt piracy.
I agree research is exempt from copyright, but people cashing in on unpaid artists works for commercial purposes is a copyright violation predating the DMCA/RIAA.
We must admit these models require piracy, and can never be seen as ethical. =3
> are often used to make fake news, slander people, or other illegal content.
That's not how these models are used in the the vast majority of cases.
This argument is like saying "kitchen knives are often used to kill people so we need to ban the sale of kitchen knives". Do some people use kitchen knives to kill? Sure. Does it mean they should be banned because of that?
> I agree research is exempt from copyright, but people cashing in on unpaid artists works for commercial purposes is a copyright violation predating the DMCA/RIAA. We must admit these models require piracy, and can never be seen as ethical. =3
So, may I ask - where exactly do you draw the line? For the sake of argument, let's imagine something like this:
1. I scrape the whole internet onto my disk.
2. I go through the text, and gather every word bigram, and build a frequency table.
3. I delete everything I scraped.
4. I use that frequency table (which, compared to the exabytes of the source text I used to build it, is a couple hundred megabytes at most) to build a text generator.
5. I profit from this text generator.
Would you consider this unethical too? Because this is essentially how LLMs work, just in a slightly fancier way. On what exact basis do you draw the line between "ethical" and "unethical" here?
This is illegal under theft-of-service laws, and a violation of most sites terms-of-service. If these spider scapers respected the robot exclusion standard under its intended use-case for search-engines, than getting successfully sued for overt copyright piracy and quietly settling for billions would seem unfair.
Note too, currently >52% of the web is LLM generated slop, so any model trained on that output will inherit similar problems.
> 2. I go through the text, and gather every word bigram, and build a frequency table.
And when (not if) a copyrighted work is plagiarized without citation it is academic misconduct, IP theft, and an artistic counterfeit. Copyright law is odd, and often doesn't make a distinction about the origin of similar works. Note this part of the law was recently extended to private individuals this year:
This doesn't matter if the output violates copyright. Images in jpeg format are compressed in the frequency domain, have been around for ages, and still get people sued or stuck in jail regularly.
Academic evaluation usually does fall under a fair-use exception, but the instant someone sells or uses IP in some form of trade/promotion it becomes a copyright violation.
> 4. I use that frequency table
See above, the how it is made argument is 100% BS. The statistical salience of LLM simply can't prevent plagiarism and copyright violations. This was cited in the original topic links.
> 5. I profit from this text generator.
Since this content may inject liabilities into commercial settings, only naive fools will use this in a commercial context. Most "AI" companies lose around $4.50 per new customer, and are a economic fiction driven by some very silly people.
LLM businesses are simply an unsustainable exploit. Unfortunately they also proved wealthy entities can evade laws through regulatory capture, and settling the legal problems they couldn't avoid.
I didn't make the rules, but do disagree cleverness supersedes a just rule of law. Have a wonderful day =3
It is true bubbles driven by the irrational can't be stopped, but one may profit from peoples delusions... and likely get discount GPUs when the economic fiction inevitably implodes. Best of luck =3
I look forwards to buying the failed data center assets. LLM make great search engines, but are not the path to "AGI". Neuromorphic computing looks more interesting. Have a great day =3
The amount of electricity we can produce is limited only by regulation, because we have practically unlimited amount of fission energy under our feet. That is what you are seeing now with all of these new nuclear plants being built and de-decommissioned. If that is too scary for you, we also have the world's greatest reserves of shale gas.
I am not pro-AI, and I agree that the market will crash. But what I take issue with is this NIMBY mentality that we should nitpick proposals with a thousand fake reasons for why we can't build anything in this country. We can't do big engineering projects like china because they are too much of an eyesore or they use too much water or they're not zoned correctly.
We can't put up a new apartment block, it's too much of a strain on the local water supply. Okay can we collect more water, invest in a new reservoir? Of course not, it will endanger the tumbleweed population.
We can't let a new datacenter go up because it will cause everyone's power prices to increase. Okay maybe we can produce more power?? No, BECAUSE ENERGY IS FINITE AND THE SUN IS JUST GOING TO EXPLODE ANYWAYS SO WHY DO YOU EVEN CARE. WTF?
Why can't we build things? Because we just can't, and actually it's impossible and you are rude for suggesting we build anything ever. It's circular reasoning designed to placate suburban NPCs.
If you oppose AI because it is ruining art, or it will drive people out of jobs, just say that. Because these fake complaints about power and water are neither compelling nor effective (they are just technological and material problems which will be ironed out in the coming generations).
These firms can do what they like if and only if they pay for every $7B reactor, the 30k year waste stewardship, and disconnect from community resources people paid for with taxes. However, currently the unethical burden cities with the endless bill for resources, contribute no actual value, and one may spot the data center waste heat signatures and industrial run-off from space.
Consider most "AI" firms lost on average $4.50 for every new user, rely on overt piracy, and delusional boards sand-bagging for time... these LLM businesses are simply unsustainable fictions.
Many problems don't have simple answers, but one may merely profit by their predictable nature. I would recommend volunteering with a local pet rescue society if you find yourself getting upset about trivia. Have a great day. =3
What trivia? I don't disagree that the AI companies are unprofitable.
These AI companies are paying for the reactors. As for waste, The Department of Energy handles spent nuclear fuel. Protests against the construction of yucca mountain have made this impossible. Nuclear power plants repeatedly sue the US Government for the cost of storing this nuclear waste on-site, because it's the DOE's problem.
And it is a totally artificial political problem. It is not even nessisarially "waste" in the sense that we ordinarily think: there is a significant amount of fissile isotope in spent fuel and countries like france recycle the majority of spent nuclear fuel. We could do the same with the right infrastructure, and it would vastly decrease the amount of waste we produce and uranium we need to mine.
My point is that complaints in these youtube videos you link (which I am very accustomed to, I have been following this for decades) present the argument that AI is politically dangerous, and this is totally separate from these material complaints (not enough water, not enough power, not enough chips, etc.) you pretend are a significant problem.
These are just extrinsic flaws which can be solved (and WILL be solved, if the USA is able to restore its manufacturing base, which it should). But my issue is purely with the intrinsic dangers of this tech, which are not fixable.
Some of the videos you link are just this suburban NIMBY nagging about muh noise pollution. You might as well get a video of people complaining about EMF pollution. The big issue here is that AI is going to take all of our jobs and will essentially harken the end of the world as we know it. It is going to get incredibly ugly very soon. Who cares about what some 50 year old boomer homeowner (who isn't going to live to see this unfold anyways) thinks about some gray building being built remotely nearby their suburb. They should go back to watching TV.
As for me, I am going to campaign to have my local pet rescue society demolished. It uses too much water and space and electricity, and for what? Something I don't care for? Seems unethical to me that I should bear the cost incurred through increased demand for these resources, even though I did not explicitly consent to the animal shelter being constructed.
This is demonstrably false with negative revenue, and when the gamblers default on the loans it is the public that will bear the consequences. Similar to sub-prime mortgages people on the con are getting tired.
Dismissing facts because you personally feel they are not important is silly. If you think the US will "win" the "AGI" race... than you are fooling yourself, as everything has already been stolen.
Have a great day, and maybe go outside for a walk to settle down a bit if you are uncomfortable with the way imaginary puppies, bunnies, and kittens make you feel. Community non-profit organizations offer tangible goodwill, and are very different from ephemeral LLM fads externalizing a suckers-bet on the public. =3
The studios did already rip off Mark Hamill of all people.
Arguing regulatory capture versus overt piracy is a ridiculous premise. The "AI" firms have so much liquid capital now... they could pay the fines indefinitely in districts that constrain damages, and already settled with larger copyright holders like it was just another nuisance fee. =3
I don’t really see it to be honest. I feel like their best and most natural use is scams.
Maybe a different comparison you would agree with is Stingrays, the devices that track cell phones. Ideally nobody would have them but as is, I’m glad they’re not easily available to any random person to abuse.
> modern LLM architectures (which aren't that different) on his website and in the github repo: e.g. he has a whole article on implementing the Qwen3 architecture from scratch.
This might be underselling it a little bit. The difference between GPT2 and Qwen3 is maybe, I don't know, ~20 lines of code difference if you write it well? The biggest difference is probably RoPE (which can be tricky to wrap your head around); the rest is pretty minor.
There’s Grouped Query Attention as well, a different activation function, and a bunch of not very interesting norms stuff. But yeah, you’re right - still very similar overall.
In this context by "real time" people usually mean "as fast as I can read the reply", so, 0.0002 tokens per minute would not be considered "real time".
> What does it mean that only 3B parameters are active at a time?
In a nutshell: LLMs generate tokens one at a time. "only 3B parameters active a a time" means that for each of those tokens only 3B parameters need to be fetched from memory, instead of all of them (30B).
Then I don't understand why it would matter. Or does it really mean that for each input token 10% of the total network runs, and then another 10% for the next token, rather than running each 10 batches of 10% for each token? If so, any idea or pointer to how the selection works?
Yes, for each token only, say, 10% of the weights are necessary, so you don't have to fetch the remaining 90% from memory, which makes inference much faster (if you're memory bound; if you're doing single batch inference then you're certainly memory bound).
As to how the selection works - each mixture-of-experts layer in the netwosk has essentially a small subnetwork called a "router" which looks at the input and calculates the scores for each expert; then the best scoring experts are picked and the inputs are only routed to them.
This one runs at perfectly servicable pace locally on a laptop 5090 with 64gb system ram with zero effort required. Just download ollama and select this model from the drop-down.
So it's called an "AI Engine", but its performance is worse than just running the same thing on CPU? Doesn't it make it essentially useless for anything AI related? What's the point of this hardware then? Better power efficiency for tiny models? Surely someone must be using it for something?
The point is offloading ML workloads to hardware that is energy efficient, not necessarily "fast" hardware.
You want to minimize the real and energy costs at the expense of time.
Assuming NPUs don't get pulled from consumer hardware altogether, theoretically the time/efficiency trade-off gap will become smaller and smaller as time goes on.
> building something on your desktop that’ll run on data center hardware in production, the DGX Spark is your answer
It isn't, because it's a different architecture than the datacenter hardware. They're both called "Blackwell", but that's a lie[1] and you still need "real" datacenter Blackwell card for development work. (For example, you can't configure/tune vLLM on Spark, and then move it into a B200 and even expect it to work, etc.)
sm_120 (aka 1CTA) supports tensor cores and TMEM just fine: example 83 shows block-scaled NVFP4 (I've gotten 1850 ish dense TFLOPs at 600W, the 300W part caps out more like 1150). sage3 (which is no way in hell from China, myelin knows it by heart) cracks a petaflop in bidirectional noncausal.
The nvfuser code doesn't even call it sm_100 vs. sm_120: NVIDIA's internal nomenclature seems to be 2CTA/1CTA, it's a bin. So there are less MMA tilings in the released ISA as of 13.1 / r85 44.
The mnemonic tcgen05.mma doesn't mean anything, it's lowered onto real SASS. FWIW the people I know doing their own drivers say the whole ISA is there, but it doesn't matter.
Wait, so are you telling me all of the hardware/ISA is actually fully accessible and functional, and it's just an artificial PTX -> SASS compiler limitation?
Because the official NVidia stance is definitely that TMEM, etc. is not supported and doesn't work.
...I don't suppose you have a link to a repo with code that can trigger any of this officially forbidden functionality?
Very interesting! Thanks! I'll definitely keep a close eye on that repo.
Anyhow, be that as it may, I was talking about the PTX mnemonics and such because I'd like to use this functionality from my own, custom kernels, and not necessarily only indirectly by triggering whatever lies at the bottom of NVidia's abstraction stack.
So what's your endgame with your proofs? You wrote "the breaking point was implementing an NVFP4 matmul" - so do you actually intend to implement an NVFP4 matmul? (: If so I'd be very much interested; personally I'm definitely still in the "cargo-cults from CUTLASS examples" camp, but would love something more principled.
But we can have that? If you have powerful enough hardware you can do it, right now. At very least until the anti-AI people get their way and either make the models' creators liable for what the models say or get rid of the "training is fair use" thing everyone depends on, in which case, sure, you'll have to kiss legal open-weight models goodbye.
reply