Um.. the model is tiny: https://github.com/thinking-machines-lab/manifolds/blob/...

jasonjmcghee · 2025-09-26T18:38:35 1758911915

Yeah, it's just the wrong architecture for the job, so I found it to be a strange example.

Here's the top model on DAWNBench - https://github.com/apple/ml-cifar-10-faster/blob/main/fast_c...

Trains for 15 epochs and it, like all the others is a 9 layer resnet.

srean · 2025-09-26T18:56:10 1758912970

Usually there's more to a ML, data-science idea (that's not a full fledged fledged out journal paper) than beating a SOTA benchmark.

In fact beating SOTA is often the least interesting part of an interesting paper and the SOTA-blind reviewers often use it as a gatekeeping device.

jasonjmcghee · 2025-09-26T21:14:37 1758921277

Sure, of course. Wasn't suggesting "are you beating a sota benchmark"? I'm floating the idea of an ablation that matches a realistic scenario for the dataset / task. Personally curious how manifold muon performs compared to AdamW in a throughly explored context. This is the first time I've seen a 3-layer mlp on cifar-10.

I probably should have made the 9-layer ResNet part more, front-and-center / central to my point.

srean · 2025-09-27T10:20:51 1758968451

Got you, this time.