There's some strong baselines for CNNs here that would be great to compare to. h...

danielhanchen · on April 7, 2024

Yes exactly this!! If Aaron et al are willing to place their optimizer on this leaderboard, that'll be fantastic. And https://dawn.cs.stanford.edu/benchmark/ImageNet/train.html - (ignoring the top 3 due to massively more GPUs)

The issue I have with Schedule-Free is you need tuning, but a well tuned SOTA can already skyrocket past a plain tuned AdamW.

tysam_and · on April 7, 2024

hey dont forget about david me and keller (he is currently the champ and has good pareto configs for not just 94 but also 95 and 96 % : https://github.com/KellerJordan/cifar10-airbench)

danielhanchen · on April 7, 2024

I saw the tweet https://twitter.com/kellerjordan0/status/1776716388037529843 and https://twitter.com/Sree_Harsha_N/status/1776733692477550875... showing a 3rd party verification!

tysam_and · on April 7, 2024

Yeah, I saw the work from @Sree_Harsha_N, though that accuracy plot on the Adam/SGD side of things is very untuned, it was about what one could expect from an afternoon of working with it, but as far as baselines go most people in the weeds with optimizers would recognize that it's pretty not-good for comparison (not to dump on the reproduction efforts).

Hence why I think it might be hard to accurately compare them, likely SGD and Adam/AdamW are going to have better potential top ends but are going to get more thrashed in public comparisons vs an optimizer that seems to perform more flatly overall. Aaron works at FAIR so I am assuming that he knows this, I reached out with some concerns on my end a little bit before he published the optimizer but didn't hear back either unfortunately.

danielhanchen · on April 7, 2024

I will also try airbench - love the clean code and am I mis-reading you need 46 SECONDS to reach 96%?? That is crazy!

tysam_and · on April 7, 2024

yeah it's been crazy to see how things have changed and im really glad that theres still interest in optimizing things for these benchmarks. ;P keller's pretty meticulous and has put in a lot of work for this from what i understand. im not sure where david's code came from originally, but it definitely impacted my code as i referenced it heavily when writing mine, and keller rewrote a lot of my code with his style + the improvements that he made in turn. hopefully the pedigree of minimal code can continue as a tradition, it really has a surprising impact

96 legitimately is pretty hard, i struggled doing it even in 2 minutes, so seeing it in 45 seconds is crazy. definitely gets exponentially harder for every fraction of a percent, so i think that's a pretty big achievement to hit :D

danielhanchen · on April 7, 2024

Ye this is a massive achievement indeed - I was quite astounded - I 100% will run this and I wanna read up on the paper - https://arxiv.org/pdf/2404.00498.pdf!