If you recall high school history, rapid, exponential "progress" happened once the knowledge was 1) written down (printing press) 2) archived for the future (libraries) 3) systematized (textbook/encyclopaedia) 4) proactively shared (public education), all on a massive scale.
The fact that some knowledge exists and is even accessible does not really matter if takes a highly trained in a very narrow field scholar to find that piece of information. You need a well established knowledge creation and distribution funnel in operation for humanity as a whole to reap the benefits of knowledge.
There is undoubtedly a lot of useful knowledge on internet platforms, however, most of that knowledge remains unsystematized and largely undiscoverable, meaning that contribution to the totality of human knowledge by these platforms is infinitesimal, which is further drowned by cat and porn videos.
Now we have 5) aggregated and internalized as a whole by computational constructs such as LLMs, which are - 4) - proactively shared (open weights, but also freemium service and dirt-cheap API access to commercial SOTA models), still on a massive scale.
> There is undoubtedly a lot of useful knowledge on internet platforms, however, most of that knowledge remains unsystematized and largely undiscoverable, meaning that contribution to the totality of human knowledge by these platforms is infinitesimal, which is further drowned by cat and porn videos.
Precisely that. Which is why I often argue, that for 99%+ of the content in the training data, its marginal contribution to the training process - itself infinitesimal in isolation - is still by far the most value that content will ever bring to the world.
The fact that some knowledge exists and is even accessible does not really matter if takes a highly trained in a very narrow field scholar to find that piece of information. You need a well established knowledge creation and distribution funnel in operation for humanity as a whole to reap the benefits of knowledge.
There is undoubtedly a lot of useful knowledge on internet platforms, however, most of that knowledge remains unsystematized and largely undiscoverable, meaning that contribution to the totality of human knowledge by these platforms is infinitesimal, which is further drowned by cat and porn videos.