While I agree that the claims are hyperbolic, I think you are approaching this paper from the point of view of someone who knows a lot about sorting. Because of this, its normal that the claims of these guys who probably don't know much about it are grating for you.
But, at its core, this is really a RL paper. The objective is to see how far a generic approach can work while understanding as little as possible about the actual domain. After AlphaGo exceeded expectations, the question becomes: "What else can RL do, and can it do anything actually useful?", and this paper seems to suggest that it can optimize code pretty well! I'm really not sure they are self-aggrandizing in terms of impact. The impact of an approach like this could potentially be very large (although I'm not saying that it actually is, I don't know enough).
Sorting is not a niche topic. Anybody who majored in CS (which is a ton of people these days) will read the abstract and most of the paper thinking "Wow, I can't believe they discovered something better than O(N log N)" because that's usually what people mean when they say "better sorting algorithm". What they discovered here is effectively a new compiler optimization. They should present it as such instead of calling it a new sorting algorithm.
But ya, discovering a new compiler optimization automatically is kinda cool.
I hope people majoring in CS will not think that, as they learned that n log n is the theorically best complexity for a sort algorithm. They will rather think that they found an algorithm with better constants in front of n log n.
Anyone who does any basic algorithmic CS stuff would have been exposed to sorting algorithms, their variations, sorting networks and so on.
There are already superoptimizers who use genetic algorithms to find the most optimal code sequence for small easily verifiable tasks. That is also a form of reinforcement learning in a way
But, at its core, this is really a RL paper. The objective is to see how far a generic approach can work while understanding as little as possible about the actual domain. After AlphaGo exceeded expectations, the question becomes: "What else can RL do, and can it do anything actually useful?", and this paper seems to suggest that it can optimize code pretty well! I'm really not sure they are self-aggrandizing in terms of impact. The impact of an approach like this could potentially be very large (although I'm not saying that it actually is, I don't know enough).