My take on this is that (good) content is one of the bigger problems still, particularly also who exactly the original training data belongs to (or where it comes from). There's a certain risk (we'll see with Github CoPilot soon) it will slow down for a bit until the licensing issues are all sorted out. This can only be solved (for now) by bringing in public funding/data, which universities have always been a very good proxy for. Which also means it (usually) should be open access to the public, to some extent (and useful for the garage folks to catch up a bit). But, once we're past that, it'll be all about that giant body of pre-trained data, securely kept within the next Facebook or Microsoft, amounting to literal data gold (just much higher value at a lot less weight).