More

abstractcontrol · 2025-03-23T20:34:09 1742762049

> Investment Strategy: Organizations should invest more in computing infrastructure than in complex algorithmic development.

> Competitive Advantage: The winners in AI won’t be those with the cleverest algorithms, but those who can effectively harness the most compute power.

> Career Focus: As AI engineers, our value lies not in crafting perfect algorithms but in building systems that can effectively leverage massive computational resources. That is a fundamental shift in mental models of how to build software.

I think the author has a fundamental misconception what making best use of computational resources requires. It's algorithms. His recommendation boils down to not do the one thing that would allow us to make the best use of computational resources.

His assumptions would only be correct if all the best algorithms were already known, which is clearly not the case at present.

Rich Sutton said something similar, but when he said it, he was thinking of old engineering intensive approaches, so it made sense in the context in which he said it and for the audience he directed it at. It was hardly groundbreaking either, the people whom he wrote the article for all thought the same thing already.

People like the author of this article don't understand the context and are taking his words as gospel. There is no reason not to think that there won't be different machine learning methods to supplant the current ones, and it's certain they won't be found by people who are convinced that algorithmic development is useless.

aDyslecticCrow · 2025-03-23T21:21:02 1742764862

I'm by the same mind.

I dare say ChatGPT 3.0 and 4.0 are the only recent examples where pure computing produced a significant edge compared to algorithmic improvements. And that edge lasted a solid year before others caught up. Even among the recent improvements;

1. Gaussian splashing, a hand-crafted method threw the entire field of Nerf models out the water. 2. Deepseek o1 is used for training reasoning without a reasoning dataset. 3. Inception-labs 16x speedup is done using a diffusion model instead of the next token prediction. 4. Deepseek distillation, compressing a larger model into a smaller model.

That sets aside the introduction of the Transformer and diffusion model themselves, which triggered the current wave in the first place.

AI is still a vastly immature field. We have not formally explored it carefully but rather randomly tested things. Good ideas are being dismissed for whatever randomly worked elsewhere. I suspect we are still missing a lot of fundamental understanding, even at the activation function level.

We need clever ideas more than compute. But the stock market seems to have mixed them up.

amarant · 2025-03-23T20:41:06 1742762466

>There is no reason not to think that there won't be different machine learning methods to supplant the current ones,

Sorry, is that a triple negative? I'm confused, but I think you're saying there WILL be improved algorithms in the future? That seems to jive better with the rest of your comment, but I just wanted to make sure I understood you correctly!

So.. Did I?

abstractcontrol · 2025-02-16T08:40:58 1739695258

Can't find it either.

abstractcontrol · 2025-02-15T08:32:32 1739608352

To me, the current LLMs aren't qualitatively different from the char RNNs that Karpathy showcased all the way back in 2015. They've gotten a lot more useful, but that is about it. Current LLMs will have as much to do with GAI as computer games have to do with NNs. Which is to say, games were necessary to develop GPUs which were then used to train NNs, and current LLMs are necessary to incentivize even more powerful hardware to come into existence, but there isn't much gratitude involved in that process.

petters · 2025-02-15T09:35:43 1739612143

> To me, the current LLMs aren't qualitatively different from the char RNNs that Karpathy showcased all the way back in 2015.

It's very difficult to understand this statement. What meaning of "qualitatively" could possibly make it true?

abstractcontrol · 2025-02-15T17:43:51 1739641431

The strengths and weaknesses of the algorithmic niche that artificial NNs are in hasn't changed a bit since a decade ago. They are still bad at anything I'd want to actually use them for that you'd imagine actual AI would be good at. The only thing that has changed is people's perception. LLMs found a market fit, but if you notice, compared to last decade where we had Deepmind and OpenAI competing at actual AI in games like Go and Starcraft, they've pretty much given up on that in favor on hyping text predictors. For anybody in the field, it should be an obvious bubble.

Underneath it all, there is some hope that an innovation might come about to keep the wave going, and indeed, a new branch of ML being discovered could revolutionize AI and actually be worthy of the hype that LLMs have now, but that has nothing to do with the LLM craze.

It's cool that we have them, and I also appreciate what Stable Diffusion has brought to the world, but in terms of how much LLMs influenced me, they only shorted the time it takes for me to read the documentation.

I don't think that machines cannot be more intelligent than humans. I don't think that the fact that they use linear algebra and mathematical functions makes the computers inferior to humans. I just think that the current algorithms suck. I want better algos so we can have actual AI instead of this trash.

abstractcontrol · 2024-10-23T13:42:05 1729690925

I've thought about adding record row polymorphism to Spiral, but I am not familiar with it and couldn't figure out how to make it work well in the presence of generics.

mrkeen · 2024-10-23T14:39:25 1729694365

Why is generics the tricky bit? Isn't that the bread-and-butter of this type system? You should just be able to substitute the term 'type variable' in the article for 'generics'.

abstractcontrol · 2024-09-06T11:22:44 1725621764

Staged FP in Spiral: https://www.youtube.com/playlist?list=PL04PGV4cTuIVP50-B_1sc...

Some of the stuff in this playlist might be relevant to you, though it is mostly about programming GPUs in a functional language that compiles to Cuda. The author (me) sometimes works on the language during the video, either fixing bugs or adding new features.

abstractcontrol · 2024-08-20T15:41:43 1724168503

What's a Net/60 basis? I am having trouble understanding how often you were paid. Every month or so?

Edit: Nwm, I saw you worked for 90 days without pay. Ack.

valhalladev · 2024-08-20T15:44:52 1724168692

Net/30 is 30 days after the contract is signed in this case, Net/60 is 60 days. Sometimes instead of "after the contract is signed" it is "after work is delivered" but my case was the former.

abstractcontrol · on June 8, 2024

https://www.youtube.com/playlist?list=PL04PGV4cTuIVP50-B_1sc...

Staged Functional Programming In Spiral

I am doing a fully fused ML GPU library along with a poker game to run it on in my own programming language that I've worked on for many years. Currently, right at this very moment, I am trying to optimize compilation times along with register usage by doing more on the heap, so I am creating a reference counting Cuda backend for Spiral.

Both the ML library and the poker game are designed to run completely on GPU for the sake of getting large speedups.

Once I am done with this and have trained the agent, I'll test it out on play money sites, and if that doesn't get it eaten by the rake, with real money.

I am doing fairly sophisticated functional programming in the videos, the kind you could only do in the Spiral language. Many parts of the series involve me working and improving the language itself in F#.

abstractcontrol · on May 13, 2024

For a deep dive, maybe take a look at the Spiral matrix multiplication playlist: https://www.youtube.com/playlist?list=PL04PGV4cTuIWT_NXvvZsn...

I spent 2 months implementing a matmult kernel in Spiral and optimizing it.

selimthegrim · on May 13, 2024

Are Winograd’s algorithms useful to implement as a learning exercise?

abstractcontrol · on May 13, 2024

Never tried those, so I couldn't say. I guess it would.

Even so, creating all the abstractions needed to implement even regular matrix multiplication in Spiral in a generic fashion took me two months, so I'd consider that good enough exercise.

You could do it a lot faster by specializing for specific matrix sizes, like in the Cuda examples repo by Nvidia, but then you'd miss the opportunity to do the tensor magic that I did in the playlist.

selimthegrim · on May 13, 2024

You are the author of the playlist/maker of the videos?

abstractcontrol · on May 14, 2024

justplay · on May 13, 2024

sorry for noob question, how gpu programming is helpful ?

abstractcontrol · on May 13, 2024

NNs for example are (mostly) a sequence of matrix multiplication operations, and GPUs are very good at those. Much better than CPUs. AI is hot at the moment, and Nvidia is producing the kind of hardware that can run large models efficiently which is why it's a 2 trillion-dollar company right now.

However, in the Spiral series, I aim to go beyond just making an ML library for running NN models and break new ground.

Newer GPUs actually support dynamic memory allocation, recursion, and the GPU threads have their own stacks, so you could in fact treat them as sequential devices and write games and simulators directly on them. I think once I finish the NL Holdem game, I'll be able to get over 100x fold improvements by running the whole program on the GPU versus the old approach of writing the sequential part on a CPU and only using the GPU to accelerate a NN model powering the computer agents.

I am not sure if this is a good answer, but this is how GPU programming would be helpful to me. It all comes down to performance.

The problem with programming them is that the program you are trying to speed up needs to be specially structured, so it utilizes the full capacity of the device.

abstractcontrol · on Feb 23, 2024

> I have quite a lot of concurrency so I think my ideal hardware is a whole lot of little CPU cores with decent cache and matmul intrinsics

Back in 2015 I thought this would be the dominant model in 2022. I thought that the AI startups challenging Nvidia would be about that. Instead, they all targetted inference instead of programmability. I thought that a Tenstorrent hardware would be about what you are talking about - lots of tiny cores, local memory, message passing between them, AI/matmult intrinsics.

I've been hyped about Tenstorrent for a long time, but now that it is finally coming out with something, I can see that the Grayskulls are very overpriced. And if you look at the docs for their low-level kernel programming, you will see that Tensix cores can only have four registers, have no register spilling, and also don't support function calls. What would one be able to program with that?

It would have been interesting had the Grayskull cards been released in 2018. But in 2024 I have no idea what the company wants to do with them. It's over five years behind what I was expecting.

My expectations for how the AI hardware wave would unfold were fit for another world entirely. If this is the best the challengers can do, the most we can hope for is that they depress Nvidia's margins somewhat so we can buy its cards cheaper in the future. As we go towards the Singularity, I've gone from expecting revolutionary new hardware from AI startups to hoping Nvidia can keep making GPUs faster and more programmable.

Ironically, that latter thing is one trend that I missed, and going from Maxwell cards to the last generation, the GPUs have gained a lot in terms of how general purpose they are. The range of domains they can be used for is definitely going up as time goes on. I thought that AI chips would be necessary for this, and that GPUs would remain as toys, but it has been the other way around.

cjbgkagh · on Feb 23, 2024

I wasn't as optimistic that there would be a broad adoption of some of the more advanced techniques I was working on so I did figure back in 2013 that most people would stick to the GEMMs and Convs with rather simple loss functions - I had a hard enough time explaining BPR triplet loss to people. Now with LLMs people will be doubling down on GEMMs for the foreseeable future.

My customers won't touch non-commodity hardware as they see it as a potential vector for vendors to screw them over, and they're not wrong about that. In a post apocalyptic they could just pull a graphics card out of a gaming computer to get things working again which gives them a strong feeling of security. Having very capable GPU cards as a commodity means I can re-use the same ops for my training and inference which roughly halves my workload.

My approach to hardware companies is that I'll believe it when I see it, I'll wait until something is publically available that I can buy off the shelf before looking too closely at it's architecture. NVidia with their Tensor Cores got so good so quickly that I never really looked too closely at alternatives. I'm kind of hopeful that AMD SoC would provide a good edge compute option so I might give that a go.

I had a look at tenstorrent given this article and the Grendel architecture seems interesting.

imtringued · on Feb 23, 2024

Grayskull shipped in 2020 and each tensix cores has five RISC-V cores. Get your basic facts right before you complain. The dev kit is just that, a dev kit. Groq sells their dev kit for $20k even though a single LPU is useless.

abstractcontrol · on Feb 24, 2024

> Groq sells their dev kit for $20k even though a single LPU is useless.

I find this a very questionable business decision.

abstractcontrol · on Feb 16, 2024

Considering the system only has a single H100, why would it be that performant?

buildbot · on Feb 16, 2024

Yeah this page is full of straight up lies?

“ Its performance in every regard is almost unreal (up to 284 times faster than x86).”

Like, there are at least 3 things wrong with that statement!

pella · on Feb 16, 2024

benchmark:

"NVIDIA GH200 CPU Performance Benchmarks Against AMD EPYC Zen 4 & Intel Xeon Emerald Rapids"

* https://www.phoronix.com/review/nvidia-gh200-gptshop-benchma...