AlphaFold is excellent engineering, but I struggle calling this a breakthrough in science. Take T cell receptor (TCR) proteins, which are produced pseudo-randomly by somatic recombination, yielding an enormous diversity. AlphaFold's predictions for those are not useful. A breakthrough in folding would have produced rules that are universal. What was produced instead is a really good regressor in the space of proteins where some known training examples are closeby.
If I was the Nobel Committee, I would have waited a bit to see if this issue aged well. Also, in terms of giving credit, I think those who invented pairwise and multiple alignment dynamic programming algorithms deserved some recognition. AlphaFold built on top of those. They are the cornerstone of the entire field of biological sequence analysis. Interestingly, ESM was trained on raw sequences, not on multiple alignments. And while it performed worse, it generalizes better to unseen proteins like TCRs.
The value in BLAST wasn't in its (very fast) alignment implementation but in the scoring function, which produced calibrated E-values that could be used directly to decide whether matches were significant or not. As a postdoc I did an extremely careful comparison of E-values to true, known similarities, and the E-values were spot on. Apparently, NIH ran a ton of evolution simulations to calibrate those parameters.
For the curious, BLAST is very much like pairwise alignment but uses an index to speed up by avoiding attempting to align poorly scoring regions.
BLAST estimates are derived from extreme value theory and large deviations, which is a very elegant area of probability and statistics.
That's the key part, I think, being able to estimate how unique each alignment is without having to simulate the null distribution, as it was done before with FASTA.
The index also helps, but the speedup comes mostly from the other part.
If I was the Nobel Committee, I would have waited a bit to see if this issue aged well. Also, in terms of giving credit, I think those who invented pairwise and multiple alignment dynamic programming algorithms deserved some recognition. AlphaFold built on top of those. They are the cornerstone of the entire field of biological sequence analysis. Interestingly, ESM was trained on raw sequences, not on multiple alignments. And while it performed worse, it generalizes better to unseen proteins like TCRs.