More

kouteiheika · 2026-01-11T08:03:08 1768118588

> when we could have had an open world with open models run locally instead where you got to keep your private health information private

But we can have that? If you have powerful enough hardware you can do it, right now. At very least until the anti-AI people get their way and either make the models' creators liable for what the models say or get rid of the "training is fair use" thing everyone depends on, in which case, sure, you'll have to kiss legal open-weight models goodbye.

kouteiheika · 2026-01-09T06:41:20 1767940880

So you'd prefer that only rich megacorporations and criminals have access to this technology, and not normal people and researchers?

renewiltord · 2026-01-09T07:03:19 1767942199

How is that surprising? The advent of modern AI tools has resulted in most people being heavily pro-IP. Everyone now talks about who has the copyright to something and so on.

kouteiheika · 2026-01-09T07:53:15 1767945195

Yes, people are now very pro-IP because it's the big corporations that are pirating stuff and harvesting data en-masse to train their models, and not just some random teenagers in their basements grabbing an mp3 off LimeWire. So now the IP laws, instead of being draconian, are suddenly not adequate.

But what is frustrating to me is that the second order effects of making the law more restrictive will be doing us all a big disfavor. It will not stop this technology, but it will just make it more inaccessible to normal people and put more power into the hands of the big corporations which the "they're stealing our data!" people would like to stop.

Right now I (a random nobody) can go on HuggingFace, download model which is more powerful that anything that was available 6 months ago, and run it locally on my machine, unrestricted and private.

Can we agree that's, in general, a good thing?

So now if you make the model creators liable for misuse of the models, or make the models a derivative work of its training data, or anything along these lines - what do you think will happen? Yep. The model on HuggingFace is gone, and now the only thing you'll have access to is a paywalled, heavily filtered and censored version of it provided by a megacorporation, while the megacorporation itself has internally an unlimited, unfiltered access to that model.

Joel_Mckay · 2026-01-09T17:37:27 1767980247

>Can we agree that's, in general, a good thing?

The models come from overt piracy, and are often used to make fake news, slander people, or other illegal content. Sure it can be funny, but the poison fruit from a poison tree is always going to be overt piracy.

I agree research is exempt from copyright, but people cashing in on unpaid artists works for commercial purposes is a copyright violation predating the DMCA/RIAA.

We must admit these models require piracy, and can never be seen as ethical. =3

'"Generative AI" is not what you think it is'

https://www.youtube.com/watch?v=ERiXDhLHxmo

kouteiheika · 2026-01-10T11:34:49 1768044889

> are often used to make fake news, slander people, or other illegal content.

That's not how these models are used in the the vast majority of cases.

This argument is like saying "kitchen knives are often used to kill people so we need to ban the sale of kitchen knives". Do some people use kitchen knives to kill? Sure. Does it mean they should be banned because of that?

> I agree research is exempt from copyright, but people cashing in on unpaid artists works for commercial purposes is a copyright violation predating the DMCA/RIAA. We must admit these models require piracy, and can never be seen as ethical. =3

So, may I ask - where exactly do you draw the line? For the sake of argument, let's imagine something like this:

    1. I scrape the whole internet onto my disk.
    2. I go through the text, and gather every word bigram, and build a frequency table.
    3. I delete everything I scraped.
    4. I use that frequency table (which, compared to the exabytes of the source text I used to build it, is a couple hundred megabytes at most) to build a text generator.
    5. I profit from this text generator.

Would you consider this unethical too? Because this is essentially how LLMs work, just in a slightly fancier way. On what exact basis do you draw the line between "ethical" and "unethical" here?

Joel_Mckay · 2026-01-10T20:39:30 1768077570

> 1. I scrape the whole internet onto my disk.

This is illegal under theft-of-service laws, and a violation of most sites terms-of-service. If these spider scapers respected the robot exclusion standard under its intended use-case for search-engines, than getting successfully sued for overt copyright piracy and quietly settling for billions would seem unfair.

Note too, currently >52% of the web is LLM generated slop, so any model trained on that output will inherit similar problems.

> 2. I go through the text, and gather every word bigram, and build a frequency table.

And when (not if) a copyrighted work is plagiarized without citation it is academic misconduct, IP theft, and an artistic counterfeit. Copyright law is odd, and often doesn't make a distinction about the origin of similar works. Note this part of the law was recently extended to private individuals this year:

"OpenAI Stole Scarlet Johansson's Voice"

https://www.youtube.com/watch?v=YhgYMH6n004

> 3. I delete everything I scraped.

This doesn't matter if the output violates copyright. Images in jpeg format are compressed in the frequency domain, have been around for ages, and still get people sued or stuck in jail regularly.

Academic evaluation usually does fall under a fair-use exception, but the instant someone sells or uses IP in some form of trade/promotion it becomes a copyright violation.

> 4. I use that frequency table

See above, the how it is made argument is 100% BS. The statistical salience of LLM simply can't prevent plagiarism and copyright violations. This was cited in the original topic links.

> 5. I profit from this text generator.

Since this content may inject liabilities into commercial settings, only naive fools will use this in a commercial context. Most "AI" companies lose around $4.50 per new customer, and are a economic fiction driven by some very silly people.

LLM businesses are simply an unsustainable exploit. Unfortunately they also proved wealthy entities can evade laws through regulatory capture, and settling the legal problems they couldn't avoid.

I didn't make the rules, but do disagree cleverness supersedes a just rule of law. Have a wonderful day =3

beeflet · 2026-01-09T07:07:31 1767942451

intellectual property isn't going to save us. it's a flimsy retort, like the water usage complaints

Joel_Mckay · 2026-01-09T17:26:49 1767979609

This covers the data center resource green-washing rhetoric, and most taxpayers will be paying more for energy now regardless of what they think:

'"Generative AI" is not what you think it is'

https://www.youtube.com/watch?v=ERiXDhLHxmo

And this paper proved the absurd outcome of the bubble is hype:

'Researchers Built a Tiny Economy. AIs Broke It Immediately'

https://www.youtube.com/watch?v=KUekLTqV1ME

It is true bubbles driven by the irrational can't be stopped, but one may profit from peoples delusions... and likely get discount GPUs when the economic fiction inevitably implodes. Best of luck =3

beeflet · 2026-01-09T21:37:03 1767994623

We can generate more energy, fabricate more computer chips and collect more water, but the impact on labor will be irreversible.

Joel_Mckay · 2026-01-10T02:44:05 1768013045

Energy is finite, and asking the public to fund a private firms irrational project is unethical.

"Memoirs of extraordinary popular delusions and the madness of crowds" (Charles Mackay, 1852)

https://www.gutenberg.org/files/24518/24518-h/24518-h.htm

I look forwards to buying the failed data center assets. LLM make great search engines, but are not the path to "AGI". Neuromorphic computing looks more interesting. Have a great day =3

beeflet · 2026-01-10T09:14:41 1768036481

The amount of electricity we can produce is limited only by regulation, because we have practically unlimited amount of fission energy under our feet. That is what you are seeing now with all of these new nuclear plants being built and de-decommissioned. If that is too scary for you, we also have the world's greatest reserves of shale gas.

I am not pro-AI, and I agree that the market will crash. But what I take issue with is this NIMBY mentality that we should nitpick proposals with a thousand fake reasons for why we can't build anything in this country. We can't do big engineering projects like china because they are too much of an eyesore or they use too much water or they're not zoned correctly.

We can't put up a new apartment block, it's too much of a strain on the local water supply. Okay can we collect more water, invest in a new reservoir? Of course not, it will endanger the tumbleweed population.

We can't let a new datacenter go up because it will cause everyone's power prices to increase. Okay maybe we can produce more power?? No, BECAUSE ENERGY IS FINITE AND THE SUN IS JUST GOING TO EXPLODE ANYWAYS SO WHY DO YOU EVEN CARE. WTF?

Why can't we build things? Because we just can't, and actually it's impossible and you are rude for suggesting we build anything ever. It's circular reasoning designed to placate suburban NPCs.

If you oppose AI because it is ruining art, or it will drive people out of jobs, just say that. Because these fake complaints about power and water are neither compelling nor effective (they are just technological and material problems which will be ironed out in the coming generations).

Joel_Mckay · 2026-01-10T11:33:07 1768044787

These firms can do what they like if and only if they pay for every $7B reactor, the 30k year waste stewardship, and disconnect from community resources people paid for with taxes. However, currently the unethical burden cities with the endless bill for resources, contribute no actual value, and one may spot the data center waste heat signatures and industrial run-off from space.

Consider most "AI" firms lost on average $4.50 for every new user, rely on overt piracy, and delusional boards sand-bagging for time... these LLM businesses are simply unsustainable fictions.

Many problems don't have simple answers, but one may merely profit by their predictable nature. I would recommend volunteering with a local pet rescue society if you find yourself getting upset about trivia. Have a great day. =3

https://www.youtube.com/watch?v=JAcwtV_bFp4

https://www.youtube.com/watch?v=Xx4Tpsk_fnM

https://www.youtube.com/watch?v=t-8TDOFqkQA

https://www.youtube.com/watch?v=yftBiNu0ZNU

https://www.youtube.com/watch?v=vrTrOCQZoQE

beeflet · 2026-01-10T21:51:26 1768081886

What trivia? I don't disagree that the AI companies are unprofitable.

These AI companies are paying for the reactors. As for waste, The Department of Energy handles spent nuclear fuel. Protests against the construction of yucca mountain have made this impossible. Nuclear power plants repeatedly sue the US Government for the cost of storing this nuclear waste on-site, because it's the DOE's problem.

And it is a totally artificial political problem. It is not even nessisarially "waste" in the sense that we ordinarily think: there is a significant amount of fissile isotope in spent fuel and countries like france recycle the majority of spent nuclear fuel. We could do the same with the right infrastructure, and it would vastly decrease the amount of waste we produce and uranium we need to mine.

My point is that complaints in these youtube videos you link (which I am very accustomed to, I have been following this for decades) present the argument that AI is politically dangerous, and this is totally separate from these material complaints (not enough water, not enough power, not enough chips, etc.) you pretend are a significant problem.

These are just extrinsic flaws which can be solved (and WILL be solved, if the USA is able to restore its manufacturing base, which it should). But my issue is purely with the intrinsic dangers of this tech, which are not fixable.

Some of the videos you link are just this suburban NIMBY nagging about muh noise pollution. You might as well get a video of people complaining about EMF pollution. The big issue here is that AI is going to take all of our jobs and will essentially harken the end of the world as we know it. It is going to get incredibly ugly very soon. Who cares about what some 50 year old boomer homeowner (who isn't going to live to see this unfold anyways) thinks about some gray building being built remotely nearby their suburb. They should go back to watching TV.

As for me, I am going to campaign to have my local pet rescue society demolished. It uses too much water and space and electricity, and for what? Something I don't care for? Seems unethical to me that I should bear the cost incurred through increased demand for these resources, even though I did not explicitly consent to the animal shelter being constructed.

Joel_Mckay · 2026-01-10T22:25:09 1768083909

>These AI companies are paying for the reactors.

This is demonstrably false with negative revenue, and when the gamblers default on the loans it is the public that will bear the consequences. Similar to sub-prime mortgages people on the con are getting tired.

Dismissing facts because you personally feel they are not important is silly. If you think the US will "win" the "AGI" race... than you are fooling yourself, as everything has already been stolen.

Have a great day, and maybe go outside for a walk to settle down a bit if you are uncomfortable with the way imaginary puppies, bunnies, and kittens make you feel. Community non-profit organizations offer tangible goodwill, and are very different from ephemeral LLM fads externalizing a suckers-bet on the public. =3

https://www.youtube.com/watch?v=FcGLveebwjo

Joel_Mckay · 2026-01-09T07:01:46 1767942106

The studios did already rip off Mark Hamill of all people.

Arguing regulatory capture versus overt piracy is a ridiculous premise. The "AI" firms have so much liquid capital now... they could pay the fines indefinitely in districts that constrain damages, and already settled with larger copyright holders like it was just another nuisance fee. =3

spencerflem · 2026-01-09T09:05:15 1767949515

Why not? I don’t think normal people have very many good uses for deepfake tech.

scotty79 · 2026-01-09T09:56:21 1767952581

Who is normal person? Non-creative? Deepfakes have immense creative potential.

spencerflem · 2026-01-09T16:38:13 1767976693

I don’t really see it to be honest. I feel like their best and most natural use is scams.

Maybe a different comparison you would agree with is Stingrays, the devices that track cell phones. Ideally nobody would have them but as is, I’m glad they’re not easily available to any random person to abuse.

Joel_Mckay · 2026-01-09T17:41:56 1767980516

>Deepfakes have immense creative potential

...and the lawyers win. =3

https://www.youtube.com/watch?v=zpcWv1lHU6I

kouteiheika · 2026-01-08T05:33:51 1767850431

> modern LLM architectures (which aren't that different) on his website and in the github repo: e.g. he has a whole article on implementing the Qwen3 architecture from scratch.

This might be underselling it a little bit. The difference between GPT2 and Qwen3 is maybe, I don't know, ~20 lines of code difference if you write it well? The biggest difference is probably RoPE (which can be tricky to wrap your head around); the rest is pretty minor.

libraryofbabel · 2026-01-08T06:02:57 1767852177

There’s Grouped Query Attention as well, a different activation function, and a bunch of not very interesting norms stuff. But yeah, you’re right - still very similar overall.

kouteiheika · 2026-01-07T04:54:49 1767761689

> Would also love to better understand what factors into "accuracy" since there might be some nuance there depending on the measure.

It's accuracy across GSM8K, MMLU, IFEVAL and LiveCodeBench.

They detail their methodology here: https://byteshape.com/blogs/Qwen3-4B-I-2507/

kouteiheika · 2026-01-07T04:53:46 1767761626

It's accuracy across GSM8K, MMLU, IFEVAL and LiveCodeBench.

They detail their methodology here: https://byteshape.com/blogs/Qwen3-4B-I-2507/

kouteiheika · 2026-01-07T04:48:47 1767761327

In this context by "real time" people usually mean "as fast as I can read the reply", so, 0.0002 tokens per minute would not be considered "real time".

rurban · 2026-01-07T07:37:18 1767771438

Real time typically means guaranteed reaction time below 30ms, because slower reactions will make the body through up.

kouteiheika · 2026-01-07T04:45:26 1767761126

> What does it mean that only 3B parameters are active at a time?

In a nutshell: LLMs generate tokens one at a time. "only 3B parameters active a a time" means that for each of those tokens only 3B parameters need to be fetched from memory, instead of all of them (30B).

tgv · 2026-01-07T10:12:18 1767780738

Then I don't understand why it would matter. Or does it really mean that for each input token 10% of the total network runs, and then another 10% for the next token, rather than running each 10 batches of 10% for each token? If so, any idea or pointer to how the selection works?

kouteiheika · 2026-01-07T12:02:38 1767787358

Yes, for each token only, say, 10% of the weights are necessary, so you don't have to fetch the remaining 90% from memory, which makes inference much faster (if you're memory bound; if you're doing single batch inference then you're certainly memory bound).

As to how the selection works - each mixture-of-experts layer in the netwosk has essentially a small subnetwork called a "router" which looks at the input and calculates the scores for each expert; then the best scoring experts are picked and the inputs are only routed to them.

kouteiheika · 2026-01-07T04:41:43 1767760903

> (If you have a bunch of money and patience, you can also run something like GPT OSS 120B or GLM 4.5 Air locally.)

Don't need patience for these, just money. A single RTX 6000 Pro runs those great and super fast.

scotty79 · 2026-01-07T14:13:59 1767795239

> GPT OSS 120B

This one runs at perfectly servicable pace locally on a laptop 5090 with 64gb system ram with zero effort required. Just download ollama and select this model from the drop-down.

Muromec · 2026-01-07T06:50:05 1767768605

Oh... 8 thousand of eurobucks for the thing.

cfn · 2026-01-07T08:28:49 1767774529

Or 4 thousand for the NVIDIA RTX A6000 which also runs the 120b just fine (quantized).

sofixa · 2026-01-07T13:11:22 1767791482

Or a single AMD Strix Halo with lots of RAM, which could be had before the RAM crisis for ~1.5k eur.

Haaargio · 2026-01-07T13:38:45 1767793125

Or why not just buy a blackwell rack?

Runs everything today with bleeding edge performance.

Overall whats the difference between 8k or 30k?

/s

kouteiheika · 2026-01-07T16:18:07 1767802687

You jest, but there's a ton of people on /r/localLLaMA which have an RTX 6000 Pro. No one has a Blackwell rack.

As long as you have the money this hardware is easily accessible to normal people, unlike fancy server hardware.

kouteiheika · 2026-01-04T03:13:16 1767496396

So it's called an "AI Engine", but its performance is worse than just running the same thing on CPU? Doesn't it make it essentially useless for anything AI related? What's the point of this hardware then? Better power efficiency for tiny models? Surely someone must be using it for something?

heavyset_go · 2026-01-04T04:04:43 1767499483

The point is offloading ML workloads to hardware that is energy efficient, not necessarily "fast" hardware.

You want to minimize the real and energy costs at the expense of time.

Assuming NPUs don't get pulled from consumer hardware altogether, theoretically the time/efficiency trade-off gap will become smaller and smaller as time goes on.

shetaye · 2026-01-04T03:47:14 1767498434

The CPU baseline seems to be the beefy host CPU. The AIE is presumably faster than what you could do with the FPGA (DPS, LUT, etc.) alone.

kouteiheika · 2026-01-02T08:14:38 1767341678

> building something on your desktop that’ll run on data center hardware in production, the DGX Spark is your answer

It isn't, because it's a different architecture than the datacenter hardware. They're both called "Blackwell", but that's a lie[1] and you still need "real" datacenter Blackwell card for development work. (For example, you can't configure/tune vLLM on Spark, and then move it into a B200 and even expect it to work, etc.)

[1] -- https://github.com/NVIDIA/dgx-spark-playbooks/issues/22

benreesman · 2026-01-02T10:37:44 1767350264

sm_120 (aka 1CTA) supports tensor cores and TMEM just fine: example 83 shows block-scaled NVFP4 (I've gotten 1850 ish dense TFLOPs at 600W, the 300W part caps out more like 1150). sage3 (which is no way in hell from China, myelin knows it by heart) cracks a petaflop in bidirectional noncausal.

The nvfuser code doesn't even call it sm_100 vs. sm_120: NVIDIA's internal nomenclature seems to be 2CTA/1CTA, it's a bin. So there are less MMA tilings in the released ISA as of 13.1 / r85 44.

The mnemonic tcgen05.mma doesn't mean anything, it's lowered onto real SASS. FWIW the people I know doing their own drivers say the whole ISA is there, but it doesn't matter.

The family of mnemonics that hits the "Jensen Keynote" path is roughly here: https://docs.nvidia.com/cuda/parallel-thread-execution/#warp....

10x path is hot today on Thor, Spark, 5090, 6000, and data center.

Getting it to trigger reliably on real tilings?

Well that's the game just now. :)

Edit: https://customer-1qh1li9jygphkssl.cloudflarestream.com/1795a...

kouteiheika · 2026-01-02T11:13:50 1767352430

Wait, so are you telling me all of the hardware/ISA is actually fully accessible and functional, and it's just an artificial PTX -> SASS compiler limitation?

Because the official NVidia stance is definitely that TMEM, etc. is not supported and doesn't work.

...I don't suppose you have a link to a repo with code that can trigger any of this officially forbidden functionality?

benreesman · 2026-01-02T11:18:15 1767352695

I'm telling your it works now. It's just not called `tcgen05`.

Put this in nsight compute: https://github.com/NVIDIA/cutlass/blob/main/examples/79_blac...

(I said 83, it's 79).

If you want to know what NVIDIA really thinks, watch this repo: https://github.com/nVIDIA/fuser. The Polyhedral Wizards at play. All the big not-quite-Fields players are splashing around there. I'm doing lean4 proofs of a bunch of their stuff. https://v0-straylight-papers-touchups.vercel.app

It works now. It's just not the PTX mnemonic that you want to see.

kouteiheika · 2026-01-02T12:13:54 1767356034

Very interesting! Thanks! I'll definitely keep a close eye on that repo.

Anyhow, be that as it may, I was talking about the PTX mnemonics and such because I'd like to use this functionality from my own, custom kernels, and not necessarily only indirectly by triggering whatever lies at the bottom of NVidia's abstraction stack.

So what's your endgame with your proofs? You wrote "the breaking point was implementing an NVFP4 matmul" - so do you actually intend to implement an NVFP4 matmul? (: If so I'd be very much interested; personally I'm definitely still in the "cargo-cults from CUTLASS examples" camp, but would love something more principled.

my123 · 2026-01-02T22:32:55 1767393175

Note that sm_110 (Jetson Thor) has the tcgen05 ISA exposed (with TMEM and all) instead of the sm_120 model.