I am an ML researcher working in the industry: by far the most effective way to maintain/advance my understanding of ML methods is implement the core of an interesting paper and reproduce (some) of their results. Completing a working implementation really forces your understanding to be on another level than if you just read the paper and think "I get it". It can be easy to read (for example) a diffusion/neural ode paper and come away thinking that you "get it" while still having a wildly inadequate understanding of how to actually get it to work yourself.
You can view this approach in the same way that a beginner learns to program. The best way to learn is by attempting to implement (as much on your own as possible) something that solves a problem you're interested in. This has been my approach from the start (for both programming and ML), and is also what I would recommend for a beginner. I've found that continuing this practice, even while working on AI systems professionally, has been critical to maintaining a robust understanding of the evolving field of ML.
The key is finding a good method/paper that meets all of the following
0) is inherently very interesting to you
1) you don't already have a robust understanding of the method
2) isn't so far above your head that you can't begin to grasp it
3) doesn't require access to datasets/compute resources you don't have
of course, finding such a method isn't always easy and often takes some searching.
I want to contrast this with other types of approaches to learning AI with include
- downloading and running other people's ML code (in a jupyter notebook or otherwise)
- watching lecture series / talks giving overviews of AI methods
- reading (without putting into action) the latest ML papers
all of which I have found to be significantly less impactful on my learning.
Sorry if this is a stupid question, but from a non-practitioner's perspective, how or why is this sensible?
Most of the cutting edge papers are trained on several $100k worth of GPU time, so does it even make sense to implement the algorithm without the available data & compute? How can you be sure that your implementation is correct, if you can't train it (hence you can't run proper inference with a good model)?
Compare that to e.g. reimplementing a pure CS paper, almost anything can be reimplemented in a simple way - even something like "distributed database over 1000 nodes", well you don't technically need 1000 servers, you can just, you know, simulate them quite cheaply.
Of course there might be similar techniques for ML but I'm just not aware of them.
The whole objective here is personal learning and this advice would be wildly different for how to practice ML professionally. The approach is directly analogous to advising a beginner programmer to get better at programming by actually writing computer programs.
> Most of the cutting edge papers are trained on several $100k worth of GPU time
Its besides the point, but I said nothing about a requirement that the methods that you choose to implement and learn from having to be cutting edge. More to the point, unless we have a different definition for what "cutting edge" means, you're wrong that "most of the cutting edge papers" require high computational resources. If that were true it would be nearly impossible for the field to make progress at the pace it does. There is a plethora of research in purely algorithmic approaches which do not require massive compute resources, and in fact this is the most productive portion of research to learn from because there the focus is on theory and progress in how to conceptualize / frame ML problems. Works which amount to "we took method X and massively scaled it up" are (in my opinion) less intellectually interesting to someone seeking to grow their knowledge in ML (though the results may might be extremely impressive and impactful, and it may be intellectually very interesting for the working directly on that project).
> How can you be sure that your implementation is correct, if you can't train it (hence you can't run proper inference with a good model)?
This is like asking how you can be sure that you've correctly implemented a B tree if you haven't used it to serve a distributed database to 1 million users. The answer is small isolated tests.
One of the best ways to really test your knowledge of an ML algorithm is to design and write unit tests to assert it behaves correctly on trivial cases. You'll find bugs in your implementation, but you'll also be forced to think carefully about what the core characteristics of the algorithm are that must be asserted in order to convince yourself that its correct. Its a common beginner mistake in ML to just run/train your model and have that be the only test of its correctness. Its like deploying a web service with zero tests and letting "do I get X number of users" be the only test of your code's correctness. It sounds insane but its basically equivalent to what most beginners do in ML (my former self included).
Do you have a few of these cutting-edge algorithmic advances papers in mind, could you list them?
I guess I got too pessimistic because of things like "emergent features" [1] / "grokking" [2] that seems to happen only with a lot of compute, and also the fact that the original (vanilla) transformer architecture remains (one of) the best, despite many additional ideas and "advances" (but that is only evident at large scale) [3].
Because of the points above, it's really hard for me, as a non-expert, to assess which papers are true advancements, and which were only published in pursuit of vanity metrics (e.g. publication counts) but actually represent overfit/cherry-picked results rather than robust progress.
I posted in another comment on this thread a list of papers which met these criteria for me at the time and which I learned a lot by implementing.
> it's really hard for me, as a non-expert, to assess which papers are true advancements
Its hard for me too, though I wouldn't consider myself an expert, just someone with a moderate amount of experience. Learning to discriminate important from less-important papers is another skill which takes effort to develop.
Often it might be viable to implement prediction w/o necessarily implementing training (especially if there are published weights or a reference implementation). Not viable for papers where the key contribution is a change to the pre-training objective / training methodology / optimizer, but useful for papers where the key contribution is architectural.
> Most of the cutting edge papers are trained on several $100k worth of GPU time
You can scale some things down. VGG 16 is basically a stack of CNNs, there's no reason you need 16 of them with an input size of 224x224x3; you can just as easily watch a 4 layer CNN learn filters on inputs of size 64x64x1. Obviously if the paper's result is achieved from sheer compute this won't work, but plenty of results come purely from the architecture.
You could also implement and run networks that are designed to be really cheap to compute. ResNet/InceptionNet, for example. I think this is a pretty important part of the space right now, considering how performant, general, and therefore inefficient Transformer architectures are.
But these are "old" models from 5+ years ago. Implementing them is not going to help you get up to speed with more recent AI research. From the OPs post, it seems like he already knows these basics.
+1 on implementing papers, that's one of the best you can do to improve your skills (anywhere in science or engineering actually). A warning: I remember trying to do this back in my uni / grad days and more often than not there is key information or things (perhaps even by accident) left out of the implementation descriptions. I was more in mechanical engineering so perhaps this is less common in AI oriented papers, but I still think it's a valid thing to look out for.
> I was more in mechanical engineering so perhaps this is less common in AI oriented papers
No, you got it right. This is EXTREMELY prevalent in modern AI/ML papers, to everyone's detriment. In the majority of interesting cases, reproduction is only possible with the original code.
I think it's actually often worse in AI papers. Fortunately at least some bigger journals/conferences encourage or require releasing source code, which makes it easier to track down subtle details that the authors didn't clearly mention in the paper.
On top of that due to its dependence on data and the ability to 'fudge' statistics, a lot of AI papers aren't really that replicable even if there aren't any implementation subtleties. For example, I've run into papers on image generation which describe some trick to improve quality, but focus entirely on standardized scores without providing any visual comparisons (and thus as feared turning out to not have as much of a visual improvement as the scores would suggest on other datasets).
While in a lot of sciences or engineering many things can be attributed to being standard practice for experts in the field, AI moves too fast to have such standards and tends to be a bit too arbitrary for such standards to mean much.
This is an issue in biomedical research as well. Sometimes I've reached out to researchers who've done similar studies and ask them missing details in their methods.
Because of criteria 0, 1, 2 these entirely depend on the individual. However, some papers which fit the criteria for me at the time were the following:
I am not sure if this might be exactly you are looking for but paperswithcode.com has a well organized selection of research with publicly available source code. Anyone trying to reproduce the code independently from the paper can always take a peek at the original source for details which may not be clear.
I liked this site initially, but decided to read the ToS and was a bit turned off:
> To the extent that you provide User Content, you hereby grant us (and represent and warrant that you have the right to grant) an irrevocable, non-exclusive, royalty-free and fully-paid-up, worldwide license to reproduce, distribute, publicly display and perform, prepare derivative works of, incorporate into other works and otherwise use and exploit such User Content, and to grant sublicenses of the foregoing rights.
So not only can Meta use these cutting edge techniques in their products without needing to request permission from the implementer, they can also sell those derivatives to anyone and practically have full ownership over what was submitted. Furthermore:
> You assume all risks associated with the use of your User Content.
and
> For the avoidance of doubt, Meta Platforms does not claim ownership of User Content you submit or other content made available for inclusion via our Website.
So, if anything goes wrong, it is the implementer’s fault entirely, but Meta may freely profit from those submissions in any way they see fit.
Sure, this is a cynical reading of the ToS, but I’m assuming the ToS will only ever be used in Meta’s favor…
Most recent papers, in NLP at least, are so sparse on detail that it is impossible to reproduce their models. And then there's the compute cost, as at least one other poster has mentioned.
You can view this approach in the same way that a beginner learns to program. The best way to learn is by attempting to implement (as much on your own as possible) something that solves a problem you're interested in. This has been my approach from the start (for both programming and ML), and is also what I would recommend for a beginner. I've found that continuing this practice, even while working on AI systems professionally, has been critical to maintaining a robust understanding of the evolving field of ML.
The key is finding a good method/paper that meets all of the following
0) is inherently very interesting to you
1) you don't already have a robust understanding of the method
2) isn't so far above your head that you can't begin to grasp it
3) doesn't require access to datasets/compute resources you don't have
of course, finding such a method isn't always easy and often takes some searching.
I want to contrast this with other types of approaches to learning AI with include
- downloading and running other people's ML code (in a jupyter notebook or otherwise)
- watching lecture series / talks giving overviews of AI methods
- reading (without putting into action) the latest ML papers
all of which I have found to be significantly less impactful on my learning.