A nice post (that should be somewhere smarter than contemporary Twitter/X). > *P...

vessenes · on March 10, 2025

The reason you'd have a benchmark is that you want to be able to check in on your model programmatically. DNA wetwork is slow and expensive. While you're absolutely right that benchmarks aren't the best thing ever and that they are used for marketing and sales purposes, they also do seem to generally create capacity momentum in the market. For instance, nobody running local LLMs right now would prefer a 12 month-old model to one of the top models today at the same size - they are significantly more capable, and many researchers believe that training on new and harder benchmarks has been a way to increase that capacity.