> Running a remote cloud LLM costs about nothing in dollars to the user.
The user doesn't care where it runs, because the user interacts with my product, not my backend. He also doesn't pay my cloud provider, he pays me.
I think we don't need to argue the fact that an on premise solution is cheaper than a cloud solution for a lot of tasks, especially when talking about bounded resources. There is some convenience in setup, and some maintenance tasks are easier, but this comes at significant costs, especially as projects get larger.
> Hopefully being based on more than predictions of local LLM performance.
What else should they be based on, pray, given the fact that smaller models suitable for on-premise and even on-machine use are improving rapidly? https://arxiv.org/pdf/2303.16199.pdf
Are they at the performance levels of cloud based very large LLMs? Not yet. But their turnover times are measured in weeks, not months. And it's not a question if there will be a higher quality base model, only when that will happen.
>I think we don't need to argue the fact that an on premise solution is cheaper than a cloud solution for a lot of tasks
In the absolute sense where we look at the total cost of running the model (and not care how it's distributed or include profits), you may be right even with scale efficiencies - making a determination requires data about cloud server costs we do not have. But I can make an informed guess about the dollar cost the user sees.
The cost the user sees is influenced by the factor called 'Microsoft and Google (etc.) have a lot of money, and seem to be perfectly willing to absorb costs to control the market and get user data', and that's enough to get user costs very low when calling to Cloud LLMs.
>>>I can easily see scenarious where big tech fail to pivot into AI properly and go down.
>>I don't think you have much to base scenarios where big tech goes down upon, but I'll be glad to hear an argument. Hopefully being based on more than predictions of local LLM performance.
>What else should they be based on, pray, given the fact that smaller models suitable for on-premise and even on-machine use are improving rapidly?
My consistent point is that technical performance is not enough. While the small open models have a tendency to overclaim[0], they'll get to GPT-4 level in time.
Success, however, is not determined by technical performance alone. There are some very big hurdles ahead. Why should big tech fall when they pivoted ahead in time, and maintain some very useful moats?
> and seem to be perfectly willing to absorb costs to control the market
That only works when the competing product incurs a cost that can be undercut, and can be pushed off market in the process.
Self-hosten open source LLMs don't incur any cost beyond the utilities. They also cannot be pushed off the market. Trying this tactic would be like trying to replace Linux as the dominant server OS by lowering the licensing costs for Windows Server.
But even acknowledging the fact that larger models still have advantages in performance, how shall that moat be maintained over time?
Even in their current state, smaller models are useful for specialised tasks. And I know I'm repeating myself, but they are also cheaper, work offline, and can be run on a laptop.
And it isn't a question if there will be better open source base models, and better training data for RLHF it's only a question of when that happens. To wit, we are still waiting for the 15B and 30B checkpoints of stableLM.
And other than with the giant models developed behind closed doors, development turnover for smaller LLMs happens in weeks, not months. Which isn't surprising, because the talent pool open source development can draw from, is basically limitless.
>That only works when the competing product incurs a cost that can be undercut
You're right here, they can't destroy open source (unless they lobby/scare legislators, but that's an entirely different matter). But this subsidization means that the smaller OSS models don't appear cheaper to the client-side users. The companies can absorb the costs - it's a typical data for free service deal, and we already know users can be receptive to these deals. subsidization is also useful to scare off commercial competition.
>...
The Google 'leak' was a dumb spin and/or an example for why Google Research failed at converting its lead because it doesn't understand business. The important moats are not in raw performance following the initial training runs. That metric is secondary. The moats are in data and access, and both require productization.
All of the specialized and local models need access to user data for their task, and BigCorp already has access and data from its products. Lots of telemetry! Everyone else are likely to get a scary user prompt for 'security' which they try to access user data. In the LLM world, data => performance, having better data could mean BigCorp keeps improving beyond OSS.
Everything needs to be deployed, and BigCorp can just push it as an OS update. OSS needs word of mouth.
BigCorp can aggregate data from multiple users on its remote end for retraining. Local models are likely to be intermittently updated (who's going to pay for that? And based on what data?) and have access only to local user data and what it saw in the original training.
OSS catching up to GPT-4 performance will eventually happen, but by that time, BigCorp could achieve a strong product moat and improve its own performance beyond GPT-4. Right now, OSS is behind where it matters, and there's no guarantee this would change. One could hope...
We can only speculate what data closed LLMs were trained on, but I'd be highly surprised if Google/openai had exclusive access to a bigger repository of written data than, well, the internet, as it presents itself to the world at large.
Products can be developed by basically every group with the passion to do so, even in an OSS setting. A great example is InvokeAI, a stable diffusion implementation that, while it doesn't (yet) offer the customization and extensability of AUTOMATIC1111 has a pretty superb UX.
So no, there is no productization-moat either.
> All of the specialized and local models need access to user data for their task
What exactly would they require "user data" for?
The LLM plugin I use for coding tasks requires access to my current vim-buffer, which the plugin provides. My script for generating documentation from API code requires only the API code. When I use an LLM to write an email, the data it requires is the prompt and some datetime information, which my script provides.
Even the existing cloud based solutions don't need access to user-data either to perform their functions.
> Everything needs to be deployed, and BigCorp can just push it as an OS update.
And app providers can just update an app. LLMs don't have some special requirements that would make updateing integrated versions any more difficult than upgrading other software.
> BigCorp could achieve a strong product moat and improve its own performance beyond GPT-4.
By doing what, deploying ever larger models? Attention based transformers have O(n^2) scaling, so that's unlikely to happen unless there is some architectural breakthrough. Which is far more likely to happen in OSS first, due to the aforementioned next to limitless talent pool.
> Right now, OSS is behind where it matters, and there's no guarantee this would change
OSS powers basically everything in the world of computing minus office software, desktops and gaming PCs, and that isn't for a lack of capability. So I'd say that purely based on experience and history, I think it's very unlikely that this won't change, and quickly.
>We can only speculate what data closed LLMs were trained on, but I'd be highly surprised if Google/openai had exclusive access to a bigger repository of written data than, well, the internet, as it presents itself to the world at large.
>What exactly would they require "user data" for?
There are several classes here:
A) Total internet data. Google/OpenAI may have more data from Google Books/GSuite/etc. but maybe not. No way to know. Maybe even if they do, it's not significant compared to total data volume. Since we can't meaningfully compare, let's just ignore it.
B) Global usage data. This is useful to further tune the model - we saw what the open models could do with a partial log of ChatGPT. OpenAI of course has the entire log. For example, it's possible that users in country X ask for stuff in a different manner, or that terms have a local meaning the model may not be aware of. Language evolves after all. A local model can at most update on current user data, or by much slower updates from the origin, and OSS has less resources here.
C) Local usage data. For example, a company may wish an LLM to access all its documents to create a knowledge base. There's a good chance all the documents stored in Office 365/GSuite. You can guess who has easy access and who gets the scary permission prompts. Another example: The LLM writing an email may wish to be aware of the previous communication in the thread and your general tone. Or replace Spotlight/Windows search with an LLM, but the LLM needs access to all your data to properly search it. Some of this can be emulated with really long prompts, but it's more efficient to just let the LLM have access.
>Even the existing cloud based solutions don't need access to user-data either to perform their functions.
Currently no, but the personal assistant they want to build will require it.
>Products can be developed by basically every group with the passion to do so
>By doing what, deploying ever larger models?
Alas, OSS devs tend to get bored on 'non-sexy' subjects. Meanwhile, Microsoft and Google will embed LLM in all their apps. The apps have their own moats (data migration, UX) and in turn act act as a moat for the LLM.
Moat in action:
Imagine Thunderbird worked with OSS LLM and Outlook works with OpenAI GPT. A user has meetings in Outlook and uses GPT to do various related planning. Say the user was willing to migrate to OSS LLM. But OSS LLM doesn't have easy interface with Outlook (Microsoft 'competitive' behaviour), and manually importing all the time is too messy. The user may even consider switching to Thunderbird, but Thunderbird doesn't do ActiveSync, and IT refuses to even consider allowing IMAP in its Exchange, so user is stuck with Outlook and in turn with OpenAI GPT. Doing ActiveSync is boring for OSS devs, so Microsoft gets an indirect moat: Exchange <=> Outlook <=> OpenAI.
SD is way more in tune, subject is way more popular with devs I guess. These people have a chance. They don't however need to brag about inevitable victory of SD or how Adobe is going down.
>> Everything needs to be deployed, and BigCorp can just push it as an OS update.
>And app providers can just update an app.
Deploying an app requires more friction. How do you get users to install it in the first place? Not impossible (see Google Chrome over Internet Explorer) but a struggle where the OS maker has a built-in advantage.
>By doing what, deploying ever larger models?
A bit of that, but I expect more effective tuning because they have way more usage data.
>Attention based transformers have O(n^2) scaling
There are numerous papers trying to improve that. We'll see.
Firstly, this would require re-training and tuning the model CONSTANTLY. Which is computationally expensive, on top of the already expensive running of the trained model. So this isn't happening, least of all in a local context.
Thirdly, we haven't even talked about the legal implications of using global usage data for training the next generation of models. I would love seeing corporations trying to explain that to, say, the EU regulators, with regards to the GDPR.
> Alas, OSS devs tend to get bored on 'non-sexy' subjects.
I have already given an example for an OSS software product in the generative AI space with superb UX. I can produce countless other examples across all realms of software. Take a look at Krita. The Dolphin file browser. The entire KDE deskop environment. Libreoffice. Firefox. Blender.
And btw. there are also countless commercial products with horrible UX.
> SD is way more in tune, subject is way more popular with devs I guess.
SD benefits from having a base model that already meets or exceeds the performance of closed source models. I see no reason why devs wouldn't be equally motivated when a sufficiently advanced LLM base model comes along.
There are definitely challenges, but your own link shows they're already trying, and that's following the more famous Tay failure. The incentives are obvious, while I doubt the challenge - at least regarding global user data - is insurmountable. It's rather well suited to BigCorp capabilities (and more difficult for OSS).
BigCorps are perfectly willing and able to deploy an army of moderators if required. Not too different than what OpenAI used to jumpstart its GPT. The reward is a market valued in billions, the moderators get minimum or 3rd world wage. If I were a BigCorp I'd jump on it.
[EDIT: we can see from the front page MEZO article that tuning does not have to be computationally expensive]
>I would love seeing corporations trying to explain that to, say, the EU regulators, with regards to the GDPR.
I don't think it's a big problem: For once, BigCorp is truly not interested in PII for training the model. Compared to what they're already doing in other fields, no reason they shouldn't be able to pass retraining easily.
>SD benefits from having a base model that already meets or exceeds the performance of closed source models.
I wanted to avoid saying it, but there are obvious SD usecases which the typical commercial interests would rather avoid. There are very motivated existing communities, which are far more likely to have a GPU. Adobe is weaker overall. The model is more accessible compared to still non-trivial LLM initial training. IMHO, these are more likely reasons than the raw technical comparison which I don't think the regular user or even regular dev bothers with.
>I have already given an example for an OSS software product in the generative AI space with superb UX
Which is why I bother writing these comments. Because OSS can compete by being good enough. But I see BigCorp strategies which give the incumbents a good chance to keep a stranglehold given the way it's going currently. Right now the ecosystem tends towards overconfidence (dumb dumb Google memo), and I think highlighting the challenges may help correction in time.
>there are also countless commercial products with horrible UX.
True. Which shows moats have more reason than technical comparisons.
>Why? App stores exist.
You still need visibility to get users to install your model. There's the (surmountable) technical challenge of deployment across varied configurations. Did I mention the biggest App stores are run by the closed competition? [Insert million HN threads about App store policies]
It should be obvious that the companies who get to install their model API by default without asking the user have an advantage, and the devs having to submit their models to be approved by the these companies are at a disadvantage.
> I wanted to avoid saying it, but there are obvious SD usecases which the typical commercial interests would rather avoid.
There are also many more use cases that don't fall into these categories.
My point still stands: SD is a prime example for what happens when a desirable OSS technology becomes competitive in quality and is then tinkered with by a near limitless amount of creative and talented developers.
> to keep a stranglehold given the way it's going currently.
> Which shows moats have more reason than technical comparisons.
This isn't office software, there is no "we always used X" factor since the technology is still in its early phase, and my thoughts about interactions with "user data" in the LLM space, have been outlined above.
Again: Strangleholds, moats, whatever we want to call it, only work if there is a competitive advantage that the competition cannot reach itself. So far, that's better model performance and ease of use. The former gap is shrinking with every week, the latter will resolve itself the same way it did for SD once the performance is good enough.
When that happens, OSS solutions are the ones with advantages that cannot be easily imitated: They run on premises, only cost utilities, can work offline, and can be endlessly tinkered with and improved upon by a near limitless talent pool.
But, as has been said before, much remains to be seen, and there are many unknown factors that will influence the outcome.
Therefore I thank you for the discussion. I look forward to seeing the next developments in this tech, and I'm confident that we're going to see OSS being as successful in the LLM space as it is in most areas of computing.
The user doesn't care where it runs, because the user interacts with my product, not my backend. He also doesn't pay my cloud provider, he pays me.
I think we don't need to argue the fact that an on premise solution is cheaper than a cloud solution for a lot of tasks, especially when talking about bounded resources. There is some convenience in setup, and some maintenance tasks are easier, but this comes at significant costs, especially as projects get larger.
> Hopefully being based on more than predictions of local LLM performance.
What else should they be based on, pray, given the fact that smaller models suitable for on-premise and even on-machine use are improving rapidly? https://arxiv.org/pdf/2303.16199.pdf
Are they at the performance levels of cloud based very large LLMs? Not yet. But their turnover times are measured in weeks, not months. And it's not a question if there will be a higher quality base model, only when that will happen.