The whole objective here is personal learning and this advice would be wildly different for how to practice ML professionally. The approach is directly analogous to advising a beginner programmer to get better at programming by actually writing computer programs.
> Most of the cutting edge papers are trained on several $100k worth of GPU time
Its besides the point, but I said nothing about a requirement that the methods that you choose to implement and learn from having to be cutting edge. More to the point, unless we have a different definition for what "cutting edge" means, you're wrong that "most of the cutting edge papers" require high computational resources. If that were true it would be nearly impossible for the field to make progress at the pace it does. There is a plethora of research in purely algorithmic approaches which do not require massive compute resources, and in fact this is the most productive portion of research to learn from because there the focus is on theory and progress in how to conceptualize / frame ML problems. Works which amount to "we took method X and massively scaled it up" are (in my opinion) less intellectually interesting to someone seeking to grow their knowledge in ML (though the results may might be extremely impressive and impactful, and it may be intellectually very interesting for the working directly on that project).
> How can you be sure that your implementation is correct, if you can't train it (hence you can't run proper inference with a good model)?
This is like asking how you can be sure that you've correctly implemented a B tree if you haven't used it to serve a distributed database to 1 million users. The answer is small isolated tests.
One of the best ways to really test your knowledge of an ML algorithm is to design and write unit tests to assert it behaves correctly on trivial cases. You'll find bugs in your implementation, but you'll also be forced to think carefully about what the core characteristics of the algorithm are that must be asserted in order to convince yourself that its correct. Its a common beginner mistake in ML to just run/train your model and have that be the only test of its correctness. Its like deploying a web service with zero tests and letting "do I get X number of users" be the only test of your code's correctness. It sounds insane but its basically equivalent to what most beginners do in ML (my former self included).
Do you have a few of these cutting-edge algorithmic advances papers in mind, could you list them?
I guess I got too pessimistic because of things like "emergent features" [1] / "grokking" [2] that seems to happen only with a lot of compute, and also the fact that the original (vanilla) transformer architecture remains (one of) the best, despite many additional ideas and "advances" (but that is only evident at large scale) [3].
Because of the points above, it's really hard for me, as a non-expert, to assess which papers are true advancements, and which were only published in pursuit of vanity metrics (e.g. publication counts) but actually represent overfit/cherry-picked results rather than robust progress.
I posted in another comment on this thread a list of papers which met these criteria for me at the time and which I learned a lot by implementing.
> it's really hard for me, as a non-expert, to assess which papers are true advancements
Its hard for me too, though I wouldn't consider myself an expert, just someone with a moderate amount of experience. Learning to discriminate important from less-important papers is another skill which takes effort to develop.
The whole objective here is personal learning and this advice would be wildly different for how to practice ML professionally. The approach is directly analogous to advising a beginner programmer to get better at programming by actually writing computer programs.
> Most of the cutting edge papers are trained on several $100k worth of GPU time
Its besides the point, but I said nothing about a requirement that the methods that you choose to implement and learn from having to be cutting edge. More to the point, unless we have a different definition for what "cutting edge" means, you're wrong that "most of the cutting edge papers" require high computational resources. If that were true it would be nearly impossible for the field to make progress at the pace it does. There is a plethora of research in purely algorithmic approaches which do not require massive compute resources, and in fact this is the most productive portion of research to learn from because there the focus is on theory and progress in how to conceptualize / frame ML problems. Works which amount to "we took method X and massively scaled it up" are (in my opinion) less intellectually interesting to someone seeking to grow their knowledge in ML (though the results may might be extremely impressive and impactful, and it may be intellectually very interesting for the working directly on that project).
> How can you be sure that your implementation is correct, if you can't train it (hence you can't run proper inference with a good model)?
This is like asking how you can be sure that you've correctly implemented a B tree if you haven't used it to serve a distributed database to 1 million users. The answer is small isolated tests.
One of the best ways to really test your knowledge of an ML algorithm is to design and write unit tests to assert it behaves correctly on trivial cases. You'll find bugs in your implementation, but you'll also be forced to think carefully about what the core characteristics of the algorithm are that must be asserted in order to convince yourself that its correct. Its a common beginner mistake in ML to just run/train your model and have that be the only test of its correctness. Its like deploying a web service with zero tests and letting "do I get X number of users" be the only test of your code's correctness. It sounds insane but its basically equivalent to what most beginners do in ML (my former self included).