My take on this is that (good) content is one of the bigger problems still, part...

My take on this is that (good) content is one of the bigger problems still, particularly also who exactly the original training data belongs to (or where it comes from). There's a certain risk (we'll see with Github CoPilot soon) it will slow down for a bit until the licensing issues are all sorted out. This can only be solved (for now) by bringing in public funding/data, which universities have always been a very good proxy for. Which also means it (usually) should be open access to the public, to some extent (and useful for the garage folks to catch up a bit). But, once we're past that, it'll be all about that giant body of pre-trained data, securely kept within the next Facebook or Microsoft, amounting to literal data gold (just much higher value at a lot less weight).