The only issue with that is the original link has its content redacted at OpenAI or Sam's request, which is why this post has a link to the archive.org version with the original contents.
Whether scaling laws hold or not, is up for debate.
What isn't up for debate, is that:
1) Giant serverfarms are expensive
2) People want on premises/on machine solutions
3) LoRA Tuning of small models continues to excel
4) Thus specialized models continue to evolve at a fast pace
5) Open source foundation models on which tuning can be done are accelerating by the week
6) Performance doesn't matter once a model is "good enough" for a given task
The long and short of it is this: LLMs are a fantastic technology, and people want to have it. And they want to have it local, private, and as cheap as possible. Just one example, how many devs are out there who would love having a local LLM model integrated into their IDE? The answer: Yes.
And again I draw your attention to point 6): For such use-cases, It doesn't matter if a gargantuan model performs a bit better ... if the small model is good enough for the task, then it wins, purely by being private, offline and is free to use.
We are already at a point, where small models are competitive for specialized tasks, and aren't performing bad at all as general models either. What will it take for bigger models to out-perform them so enourmously that customers simply have no other choice than to use them? A 5x increase in size? 10x? 100x? Where do we run models with trillions of params? What will that cost, and how available will that be?
> And they want to have it local, private, and as cheap as possible.
Most people don't care about local and private. They care about cheap and free ChatGPT is the cheapest they could get, which is better than all open access models.
> Open source foundation models on which tuning can be done are accelerating by the week
Based on the (unverified)information I have been hearing in the research circle is that OpenAI invested heavily in data collection. One thing I heard is that they have hired experts to label lot of codes with information like complexity, correctness etc. Also the questions and RLHF data they get from real users is something open access models can't match.
Palm 2 chat is significantly worse than ChatGPT(free), even though it is larger and trained by experts in the field.
Right, but corporations do. And AI is a revolution because it makes workers substantially more efficient, not because it makes for a slightly better Google. If corporations want to leverage AI to make their workforce more efficient, it must be local and private in many cases. OpenAI is already experiencing friction in the EU.
Same corporations happily use Office 365 and GSuite for all their documents rather than a local and private solution. I'm sure they'll come to a similar arrangement with regard to AI.
That doesn't change the fact that these corporations have an interest in data privacy and cost reduction. To wit, if MS wouldn't guarantee data security in european datacenters, they would be way less competitive in the EU.
Many corporations run their own datacenters, have on premises mailservers and run their own backup solutions. Not everything is in the cloud.
1) Giant serverfarms are expensive, but Microsoft has money and savings on scale. This can compare favourably to running locally.
2) People use what LLMs software makers give them and don't care at all about on premise. Software makers want something reliable they can deploy at scale, and not end up debugging problems on a client 4GB machine with an iGPU and multitude of OSs. The actually deciding people want to delegate to an API, and the only offerings available right now are from OpenAI, Google, Anthropic (aka Google again) etc..
3) Open source LLM products and APIs are at the zero point and remain so by the week. "Download from GitHub and compile" is not going to fly.
4) Right now the question is whether the Closed Source and Propriety product people will actually use will be provided by a web API (openAI or maybe Meta) or by the OS (Apple, Microsoft, Google, etc.). Either way it will be controlled by giant corporations. Unless 3) ever changes, but doing product is icky for OSS people for some reason so that will never happen.
[EDIT: Note that the Stable Diffusion area is in way better shape. There's a one-click installer for Auto111! There are some products and sites using SD! There are some usecases important commercial interests don't want! These OSS people also don't engage in triumphalism much less unearned triumphalism! They are probably still going to lose to Adobe and Midjourney but at least it will be a fight.]
A long time ago, in order to get computers to do anything useful, the hardware required huge rooms.
Today, a little cell phone is significantly more powerful than a room sized computer back in the days.
LLMs shouldn't be any different. Today, they require giant server farms to run. Tomorrow, they will run on little robots/cell phones.
A few things will allow this:
1. Chip makers like Apple, Qualcomm, AMD, Intel, Nvidia, will heavily emphasize AI performance in their future chips. Expect accelerators like the Neural Engine to get significantly bigger. In the future, I expect that most of the transistors in a SoC will be dedicated to AI acceleration - and not CPUs/GPUs.
2. Moore's law is not dead. We will continue to get more transistors in a given area. For example, TSMC has plans for 1nm which will likely have 4-5x more transistors per area than the 5nm Apple M2 or around 100 billion transistors. For comparison, the 4090 has 76 billion transistors. A 1nm Apple Silicon Max chip could theoretically have around 300 billion transistors, which makes it 4x more than a 4090. If GPT-4 requires 4x 4090 to run, then a single 1nm M Max might be able to do it in the future.
3. Larger models will optimize to require less resources.
4. Smaller models will become more capable.
These forces will converge and we will have local LLMs that are top of the line.
We started from thin clients connecting to room-sized mainframes made by giant corporations, and we ended at small cellphone-size devices mostly acting as thin clients connecting to datacenter-building-sized services run by giant corporations, even though the small devices could theoretically do much more than the original thin clients.
Being able to run locally won't change much, if the local LLMs are owned by Microsoft, Google and Apple. Or if the 'local' LLM is actually an API call to a web service because the programmer/company decided to delegate the issue and not deal with all the software deployment issues. Or because Microsoft decided to hoover up all the data, and programmers decided it's so much easier to just call the OS API and not care whether it's running locally or remotely.
Yeah, you'll have a local LLM that is pretty good. There'll always be a bit better LLM running remotely, because most people won't run giant servers and don't care to update often. How much does any of that matter, when the AI is completely controlled by GiantCorp?
It will honestly depend on just how much better the cloud versions are relative to the local versions.
If the cloud versions are significantly better (like they are now) than local LLMs, then cloud will continue to be the way. If local LLMs reach 95% of what cloud versions can do, then I think local ones might win out because the cost will be smaller, it will be faster (in latency), and it will have more privacy.
In 3-4 years? I'm willing to bet that local LLMs will have a sizable market.
I basically expect the neural engine inside an Apple Silicon M7 to be 80% of the SoC - instead of 10% like it is today. We're going to be buying NPUs (Neural Processor Unit) with a CPU and GPU attached to it. Right now, we're buying a CPU with a GPU and NPU attached to it.
I don't expect giant corporations to control LLMs. I expect smaller companies to be able to compete and that techniques for training and deploying LLMs will eventually look similar to how software is built today - where anyone can train and deploy LLMs.
>It will honestly depend on just how much better the cloud versions are relative to the local versions.
'Better' has many meanings. Ultimately LLM performance depends not only on processor performance, but also on data. One could imagine real-time self-training, where the real-time availability of (allegedly anonymized) data from other users leads to a stable advantage over local LLM.
Another issue is deployment. Imagine this: You are a software developer using an LLM in your commercial product. You have two/three choices.
A) Call a cloud API using a simple cross-platform HTTP call. This will reliably work. While at it, you may gain useful data on your customers.
B) Include a local LLM, and deal with all the lovely deployment issues, especially when the guys running a 8th gen Intel laptop with an iGPU and 4GB memory call and ask why they can't run your product.
Perhaps there will be a third option:
C) Call a local API provided by the OS, which may or may not call a cloud API in the background.
One can see how even if the performance is totally equivalent, Cloud API may end up with an advantage.
IMHO, this is not a desirable future, but to avoid it we need to start by cutting down on the triumphalism.
[EDIT:
>If local LLMs reach 95% of what cloud versions can do, then I think local ones might win out because the cost will be smaller, it will be faster (in latency), and it will have more privacy.
Cost for whom? The local user will pay less if the call is remote. The cloud AIs can make up for their costs in data. Unfortunately, users often ignore their privacy.
]
>Another issue is deployment. Imagine this: You are a software developer using an LLM in your commercial product. You have two/three choices.
>Call a cloud API using a simple cross-platform HTTP call. This will reliably work. While at it, you may gain useful data on your customers
In the not too distant future, I expect most applications to be replaced by LLM assistants. So the above scenario won't really happen. You will interact with one LLM, maybe it's a local flavor or maybe it's a cloud flavor. The LLM will then call other services for you. So I don't expect services to call LLMs but I do expect LLMs to call other services.
>>Call a cloud API using a simple cross-platform HTTP call. This will reliably work. While at it, you may gain useful data on your customers.
>In the not too distant future, I expect most applications to be replaced by LLM assistants. So the above scenario won't really happen.
Regardless we have the question of how to deploy the local LLM. If the frontend will be the OS, than we can be reliably sure that Microsoft and Google will have motivation to use their cloud hosted AI and you may have to fight the OS to get it to install your model. We may need regulation to ensure local LLMs are possible.
> Regardless we have the question of how to deploy the local LLM.
The same as we do any other software? There is nothing inherently more difficult in deploying an ML model than there is, say, in setting up a redudant high traffic RDBMS.
> If the frontend will be the OS
Why would the frontend be the OS? The OS isn't the frontend to my database. The OS isn't my IDE. The OS isn't the games I play. The OS isn't even my browser.
Today a little cell phone spends much of its time accessing hardware in huge rooms.
Even if that weren't true and you look exclusively at on-prem and in-hand software, it took 25-30 years to shrink supercomputers to pocket size.
Moore's Law is certainly struggling. Modern developments are at least as much about architectural refinement as transistor density. That won't change unless there's a completely new game changer technology. (Optical? Quantum?) Whatever it is, it hasn't been invented yet. So it's unlikely to be productised within the next ten years.
There's no huge technical benefit to running models locally. Aside from subscription costs - which would be balanced by more expensive hardware - and maybe response time, if the cycles are there, there's not a lot you can do with a local model that you can't do with a hosted model and an API.
The real issue with models is training. If you want to train locally you need access to a huge dataset.
>Today a little cell phone spends much of its time accessing hardware in huge rooms.
The vast majority of this behavior are basic CRUD operations. The iPhone's single thread speed is likely faster than the server it's calling. For example, the Apple A16 contains 16 billion transistors - which is significantly more than a Zen4 CCD.
>Even if that weren't true and you look exclusively at on-prem and in-hand software, it took 25-30 years to shrink supercomputers to pocket size.
True but technology moves at a faster pace now. Chips get exponentially faster, not linearly. Do I expect a cell phone in 2027 to be as fast as a server room in 2023? No. But I expect the forces I outlined above to make it so that local LLMs will be useful enough for people to keep on their phone or computer? Yes. That's my bet.
>The real issue with models is training. If you want to train locally you need access to a huge dataset.
The assumption is that we will be downloading and updating models similar to how apps are currently deployed.
1) I can't do the complete math since I don't know the exact costs, but basically:
A) Running a remote cloud LLM costs about nothing in dollars to the user.
B) Running a remote cloud LLM does cost for the provider. Note however Microsoft and Google are the cloud, so they don't have to pay the absurd profit margins. They already have specialized hardware and can bring their costs down.
C) Microsoft and Google can probably make money from your data, even if it's 'anonymized' LLM requests.
D) Data for convenience is a typical bargain, e.g. in the search space. There's good reason to assume it can work here too.
2-3) I'm not arguing no one needs on prem. I am extrapolating the current state of affairs into the future, because, well, we don't have much else to go on. The current state of affairs isn't good.
4) Yes, this is called a logical extrapolation. I don't think you have much to base scenarios where big tech goes down upon, but I'll be glad to hear an argument. Hopefully being based on more than predictions of local LLM performance.
> Running a remote cloud LLM costs about nothing in dollars to the user.
The user doesn't care where it runs, because the user interacts with my product, not my backend. He also doesn't pay my cloud provider, he pays me.
I think we don't need to argue the fact that an on premise solution is cheaper than a cloud solution for a lot of tasks, especially when talking about bounded resources. There is some convenience in setup, and some maintenance tasks are easier, but this comes at significant costs, especially as projects get larger.
> Hopefully being based on more than predictions of local LLM performance.
What else should they be based on, pray, given the fact that smaller models suitable for on-premise and even on-machine use are improving rapidly? https://arxiv.org/pdf/2303.16199.pdf
Are they at the performance levels of cloud based very large LLMs? Not yet. But their turnover times are measured in weeks, not months. And it's not a question if there will be a higher quality base model, only when that will happen.
>I think we don't need to argue the fact that an on premise solution is cheaper than a cloud solution for a lot of tasks
In the absolute sense where we look at the total cost of running the model (and not care how it's distributed or include profits), you may be right even with scale efficiencies - making a determination requires data about cloud server costs we do not have. But I can make an informed guess about the dollar cost the user sees.
The cost the user sees is influenced by the factor called 'Microsoft and Google (etc.) have a lot of money, and seem to be perfectly willing to absorb costs to control the market and get user data', and that's enough to get user costs very low when calling to Cloud LLMs.
>>>I can easily see scenarious where big tech fail to pivot into AI properly and go down.
>>I don't think you have much to base scenarios where big tech goes down upon, but I'll be glad to hear an argument. Hopefully being based on more than predictions of local LLM performance.
>What else should they be based on, pray, given the fact that smaller models suitable for on-premise and even on-machine use are improving rapidly?
My consistent point is that technical performance is not enough. While the small open models have a tendency to overclaim[0], they'll get to GPT-4 level in time.
Success, however, is not determined by technical performance alone. There are some very big hurdles ahead. Why should big tech fall when they pivoted ahead in time, and maintain some very useful moats?
> and seem to be perfectly willing to absorb costs to control the market
That only works when the competing product incurs a cost that can be undercut, and can be pushed off market in the process.
Self-hosten open source LLMs don't incur any cost beyond the utilities. They also cannot be pushed off the market. Trying this tactic would be like trying to replace Linux as the dominant server OS by lowering the licensing costs for Windows Server.
But even acknowledging the fact that larger models still have advantages in performance, how shall that moat be maintained over time?
Even in their current state, smaller models are useful for specialised tasks. And I know I'm repeating myself, but they are also cheaper, work offline, and can be run on a laptop.
And it isn't a question if there will be better open source base models, and better training data for RLHF it's only a question of when that happens. To wit, we are still waiting for the 15B and 30B checkpoints of stableLM.
And other than with the giant models developed behind closed doors, development turnover for smaller LLMs happens in weeks, not months. Which isn't surprising, because the talent pool open source development can draw from, is basically limitless.
>That only works when the competing product incurs a cost that can be undercut
You're right here, they can't destroy open source (unless they lobby/scare legislators, but that's an entirely different matter). But this subsidization means that the smaller OSS models don't appear cheaper to the client-side users. The companies can absorb the costs - it's a typical data for free service deal, and we already know users can be receptive to these deals. subsidization is also useful to scare off commercial competition.
>...
The Google 'leak' was a dumb spin and/or an example for why Google Research failed at converting its lead because it doesn't understand business. The important moats are not in raw performance following the initial training runs. That metric is secondary. The moats are in data and access, and both require productization.
All of the specialized and local models need access to user data for their task, and BigCorp already has access and data from its products. Lots of telemetry! Everyone else are likely to get a scary user prompt for 'security' which they try to access user data. In the LLM world, data => performance, having better data could mean BigCorp keeps improving beyond OSS.
Everything needs to be deployed, and BigCorp can just push it as an OS update. OSS needs word of mouth.
BigCorp can aggregate data from multiple users on its remote end for retraining. Local models are likely to be intermittently updated (who's going to pay for that? And based on what data?) and have access only to local user data and what it saw in the original training.
OSS catching up to GPT-4 performance will eventually happen, but by that time, BigCorp could achieve a strong product moat and improve its own performance beyond GPT-4. Right now, OSS is behind where it matters, and there's no guarantee this would change. One could hope...
We can only speculate what data closed LLMs were trained on, but I'd be highly surprised if Google/openai had exclusive access to a bigger repository of written data than, well, the internet, as it presents itself to the world at large.
Products can be developed by basically every group with the passion to do so, even in an OSS setting. A great example is InvokeAI, a stable diffusion implementation that, while it doesn't (yet) offer the customization and extensability of AUTOMATIC1111 has a pretty superb UX.
So no, there is no productization-moat either.
> All of the specialized and local models need access to user data for their task
What exactly would they require "user data" for?
The LLM plugin I use for coding tasks requires access to my current vim-buffer, which the plugin provides. My script for generating documentation from API code requires only the API code. When I use an LLM to write an email, the data it requires is the prompt and some datetime information, which my script provides.
Even the existing cloud based solutions don't need access to user-data either to perform their functions.
> Everything needs to be deployed, and BigCorp can just push it as an OS update.
And app providers can just update an app. LLMs don't have some special requirements that would make updateing integrated versions any more difficult than upgrading other software.
> BigCorp could achieve a strong product moat and improve its own performance beyond GPT-4.
By doing what, deploying ever larger models? Attention based transformers have O(n^2) scaling, so that's unlikely to happen unless there is some architectural breakthrough. Which is far more likely to happen in OSS first, due to the aforementioned next to limitless talent pool.
> Right now, OSS is behind where it matters, and there's no guarantee this would change
OSS powers basically everything in the world of computing minus office software, desktops and gaming PCs, and that isn't for a lack of capability. So I'd say that purely based on experience and history, I think it's very unlikely that this won't change, and quickly.
>We can only speculate what data closed LLMs were trained on, but I'd be highly surprised if Google/openai had exclusive access to a bigger repository of written data than, well, the internet, as it presents itself to the world at large.
>What exactly would they require "user data" for?
There are several classes here:
A) Total internet data. Google/OpenAI may have more data from Google Books/GSuite/etc. but maybe not. No way to know. Maybe even if they do, it's not significant compared to total data volume. Since we can't meaningfully compare, let's just ignore it.
B) Global usage data. This is useful to further tune the model - we saw what the open models could do with a partial log of ChatGPT. OpenAI of course has the entire log. For example, it's possible that users in country X ask for stuff in a different manner, or that terms have a local meaning the model may not be aware of. Language evolves after all. A local model can at most update on current user data, or by much slower updates from the origin, and OSS has less resources here.
C) Local usage data. For example, a company may wish an LLM to access all its documents to create a knowledge base. There's a good chance all the documents stored in Office 365/GSuite. You can guess who has easy access and who gets the scary permission prompts. Another example: The LLM writing an email may wish to be aware of the previous communication in the thread and your general tone. Or replace Spotlight/Windows search with an LLM, but the LLM needs access to all your data to properly search it. Some of this can be emulated with really long prompts, but it's more efficient to just let the LLM have access.
>Even the existing cloud based solutions don't need access to user-data either to perform their functions.
Currently no, but the personal assistant they want to build will require it.
>Products can be developed by basically every group with the passion to do so
>By doing what, deploying ever larger models?
Alas, OSS devs tend to get bored on 'non-sexy' subjects. Meanwhile, Microsoft and Google will embed LLM in all their apps. The apps have their own moats (data migration, UX) and in turn act act as a moat for the LLM.
Moat in action:
Imagine Thunderbird worked with OSS LLM and Outlook works with OpenAI GPT. A user has meetings in Outlook and uses GPT to do various related planning. Say the user was willing to migrate to OSS LLM. But OSS LLM doesn't have easy interface with Outlook (Microsoft 'competitive' behaviour), and manually importing all the time is too messy. The user may even consider switching to Thunderbird, but Thunderbird doesn't do ActiveSync, and IT refuses to even consider allowing IMAP in its Exchange, so user is stuck with Outlook and in turn with OpenAI GPT. Doing ActiveSync is boring for OSS devs, so Microsoft gets an indirect moat: Exchange <=> Outlook <=> OpenAI.
SD is way more in tune, subject is way more popular with devs I guess. These people have a chance. They don't however need to brag about inevitable victory of SD or how Adobe is going down.
>> Everything needs to be deployed, and BigCorp can just push it as an OS update.
>And app providers can just update an app.
Deploying an app requires more friction. How do you get users to install it in the first place? Not impossible (see Google Chrome over Internet Explorer) but a struggle where the OS maker has a built-in advantage.
>By doing what, deploying ever larger models?
A bit of that, but I expect more effective tuning because they have way more usage data.
>Attention based transformers have O(n^2) scaling
There are numerous papers trying to improve that. We'll see.
Firstly, this would require re-training and tuning the model CONSTANTLY. Which is computationally expensive, on top of the already expensive running of the trained model. So this isn't happening, least of all in a local context.
Thirdly, we haven't even talked about the legal implications of using global usage data for training the next generation of models. I would love seeing corporations trying to explain that to, say, the EU regulators, with regards to the GDPR.
> Alas, OSS devs tend to get bored on 'non-sexy' subjects.
I have already given an example for an OSS software product in the generative AI space with superb UX. I can produce countless other examples across all realms of software. Take a look at Krita. The Dolphin file browser. The entire KDE deskop environment. Libreoffice. Firefox. Blender.
And btw. there are also countless commercial products with horrible UX.
> SD is way more in tune, subject is way more popular with devs I guess.
SD benefits from having a base model that already meets or exceeds the performance of closed source models. I see no reason why devs wouldn't be equally motivated when a sufficiently advanced LLM base model comes along.
There are definitely challenges, but your own link shows they're already trying, and that's following the more famous Tay failure. The incentives are obvious, while I doubt the challenge - at least regarding global user data - is insurmountable. It's rather well suited to BigCorp capabilities (and more difficult for OSS).
BigCorps are perfectly willing and able to deploy an army of moderators if required. Not too different than what OpenAI used to jumpstart its GPT. The reward is a market valued in billions, the moderators get minimum or 3rd world wage. If I were a BigCorp I'd jump on it.
[EDIT: we can see from the front page MEZO article that tuning does not have to be computationally expensive]
>I would love seeing corporations trying to explain that to, say, the EU regulators, with regards to the GDPR.
I don't think it's a big problem: For once, BigCorp is truly not interested in PII for training the model. Compared to what they're already doing in other fields, no reason they shouldn't be able to pass retraining easily.
>SD benefits from having a base model that already meets or exceeds the performance of closed source models.
I wanted to avoid saying it, but there are obvious SD usecases which the typical commercial interests would rather avoid. There are very motivated existing communities, which are far more likely to have a GPU. Adobe is weaker overall. The model is more accessible compared to still non-trivial LLM initial training. IMHO, these are more likely reasons than the raw technical comparison which I don't think the regular user or even regular dev bothers with.
>I have already given an example for an OSS software product in the generative AI space with superb UX
Which is why I bother writing these comments. Because OSS can compete by being good enough. But I see BigCorp strategies which give the incumbents a good chance to keep a stranglehold given the way it's going currently. Right now the ecosystem tends towards overconfidence (dumb dumb Google memo), and I think highlighting the challenges may help correction in time.
>there are also countless commercial products with horrible UX.
True. Which shows moats have more reason than technical comparisons.
>Why? App stores exist.
You still need visibility to get users to install your model. There's the (surmountable) technical challenge of deployment across varied configurations. Did I mention the biggest App stores are run by the closed competition? [Insert million HN threads about App store policies]
It should be obvious that the companies who get to install their model API by default without asking the user have an advantage, and the devs having to submit their models to be approved by the these companies are at a disadvantage.
> I wanted to avoid saying it, but there are obvious SD usecases which the typical commercial interests would rather avoid.
There are also many more use cases that don't fall into these categories.
My point still stands: SD is a prime example for what happens when a desirable OSS technology becomes competitive in quality and is then tinkered with by a near limitless amount of creative and talented developers.
> to keep a stranglehold given the way it's going currently.
> Which shows moats have more reason than technical comparisons.
This isn't office software, there is no "we always used X" factor since the technology is still in its early phase, and my thoughts about interactions with "user data" in the LLM space, have been outlined above.
Again: Strangleholds, moats, whatever we want to call it, only work if there is a competitive advantage that the competition cannot reach itself. So far, that's better model performance and ease of use. The former gap is shrinking with every week, the latter will resolve itself the same way it did for SD once the performance is good enough.
When that happens, OSS solutions are the ones with advantages that cannot be easily imitated: They run on premises, only cost utilities, can work offline, and can be endlessly tinkered with and improved upon by a near limitless talent pool.
But, as has been said before, much remains to be seen, and there are many unknown factors that will influence the outcome.
Therefore I thank you for the discussion. I look forward to seeing the next developments in this tech, and I'm confident that we're going to see OSS being as successful in the LLM space as it is in most areas of computing.
> 4) but doing product is icky for OSS people for some reason
I am not talking about OSS people. I am talking about integrators. And there is already demand for on premise LLM solutions.
Integrators don't ask customers to `git clone` some repo, same as they don't ask them to familiarize themselves with some cloud providers API. They ask them to buy/use the already integrated product.
> Note that the Stable Diffusion area is in way better shape.
Yes, because integrators got to work wrapping up the OSS into viable products. The same is going to happen with LLMs.
>there is already demand for on premise LLM solutions.
There's also demand for on premise office suits and on premise internal company communication and private smartphones. Guess who controls these markets and how often it's on premise? Unfortunately, experience suggests users would drop privacy for convenience...
>Integrators
Here the requirements are far more complex than for the SD case. SD can run standalone, or as plugin. Its users are adapt, tend to have good GPUs, and there isn't that much relevant software - one can do a bespoke solution for each (a PS plugin is enough to keep many happy). For LLMs, either you supply a cloud API, or deal with a significant deployment burden nobody AFAIK has even started handling. Not to mention the licenses...
Current best hope for on-premise is that one of the OS makers decides to do the work and does it well enough we could trust it.
These markets benefit from deep OS integration, interconnectivity between users products and a long history of "we have always used Word". None of that is the case for LLMs.
> For LLMs, either you supply a cloud API, or deal with a significant deployment burden nobody AFAIK has even started handling.
Why would there be a deployment burden? The actual setup isn't hard, the rest is wrapping the thingamabob into an API the product can use...same btw. as I have to do for cloud-LLM APIs, because the product ideally wants to be able to use different providers, so I have to abstract the API away anyway.
Sure, I have to worry about infrastructure, maintenance, and performance characteristics, but that's hardly different from running our databases or web-backends.
> Current best hope for on-premise is that one of the OS makers decides to do the work and does it well enough we could trust it.
Why? I don't have to trust the OS providers to integrate, say, postgres or nginx or redis either. Why would I need their help to run another piece of software that lives entirely in userspace?
>These markets benefit from deep OS integration, interconnectivity between users products and a long history of "we have always used Word". None of that is the case for LLMs.
The old moats did not stop existing because LLMs came along. e.g. a big use case for LLMs is running over one's own documents. We can guess who very likely already has access (MS and Google), and who is going to get a scary permission prompt at best (everyone else). OS integration is obviously coming, MS announced Copilot and Google can just silently upgrade Assistant. We may not be far from a repeat of the IE saga, where MS declares Copilot as essential part of Windows.
My point is that commercial demand for privacy is not enough, and may not prevent Cloud APIs from dominating.
>>For LLMs, either you supply a cloud API, or deal with a significant deployment burden
>Why would there be a deployment burden? The actual setup isn't hard, the rest is wrapping the thingamabob into an API the product can use...
Well, the entire idea is to run locally, right? Now you have to worry about getting the runtime to work on 3 different OSs, and many many hardware variations (like the many Intel AVX variants, low memory configurations, etc.). Just take a look at the recent moves to flatpack/snap everything, and that's merely due to software variations on a single OS.
>Why? I don't have to trust the OS providers to integrate, say, postgres or nginx or redis either. Why would I need their help to run another piece of software that lives entirely in userspace?
I expect LLMs to be a ubiquitous UI paradigm. We do expect OS providers to bundle a UI kit, and not everyone to roll their own, right?
Besides, you are comparing to server software. But non-cloud LLMs are expected to run locally on clients. That's a different world. Clients are a lot more diverse, weaker in processing power, and most can't handle upgrading postgres.
All that said, perhaps we should look at legislation as well as the possibility of a OS provider playing nice. We're talking about a lot of data/power in the hands of a few corps.
> Well, the entire idea is to run locally, right? Now you have to worry about getting the runtime to work on 3 different OSs,
You mean, like a runtime that can be a single, statically linked binary, that requires a c-compiler? With bindings for several other popular languages? https://github.com/ggerganov/llama.cpp/
Yeah, maybe I am an optimist, but somehow I don't see that being a big problem...
> I expect LLMs to be a ubiquitous UI paradigm.
I don't expect that to be the case. LLMs are useful in text wrangling tasks, for which we have specialized software. Software that already bundles large amounts of libraries.
But what I do expect from an UI: a) that it is fully functional even when the machine is offline, and b) that it's responsiveness is independent from the load on someones datacenter.
> But non-cloud LLMs are expected to run locally on clients.
They don't have to. I can already see many useful applications for LLMs that run in on-premises servers. I never said that open source LLMs have to be restricted to client-devices.
> All that said, perhaps we should look at legislation as well as the possibility of a OS provider playing nice.
Legislation is another good argument for open source LLMs taking over. Looking at the EU AI Act, and the recent ideas coming up in the US, it looks like model training data being open for audits is gonna be a thing in the future, especially if the model is used in a context a lot of peoples livelihoods could rely on.
>>> Well, the entire idea is to run locally, right? Now you have to worry about getting the runtime to work on 3 different OSs,
>llama.cpp
I'm well aware of llama.cpp, and run it myself. I'm also aware of the recent breaking change to the quantization format because 'everyone can just rebuild from the ggml'. And its difficulties with base level AVX or AVX512 (in fairness it focuses on M2 and has support for AVX2). And how you need to work to add additional non-Llama models to it (necessary due to the commercial clause). It's not quite ready yet. Even when it's ready, there's the work for actual deployment, and the mess of diverse user systems beyond what devs tend to run.
I'm not saying the challenge is impossible - but it's an extra hurdle for OSS.
>> I expect LLMs to be a ubiquitous UI paradigm.
> I don't expect that to be the case. LLMs are useful in text wrangling tasks, for which we have specialized software
Look at the recent photoshop generative fill feature for what could be done.
>what I do expect from an UI: a) that it is fully functional even when the machine is offline, and b) that it's responsiveness is independent from the load on someones datacenter.
Good ideas. I think for now BigCorp can avoid it by pushing LLMs as an addon UI and keeping the base UI functional. It's possible they'd ship smaller models on device too.
>Legislation is another good argument for open source LLMs taking over.
That I can buy. The legal details are however a bit beyond me right now, and I don't know what the current status is. It could push OSS or stop it.
> How many devs are out there who would love having a local LLM model integrated into their IDE? The answer: Yes.
Tbf, local models are still completely crap for that because the base model isn't as good. Software development is expensive so if a 1% better model cuts 10 hours of work it's already worth it.
The problem is more in that you can't send private company data to OpenAI and back because they'll use it for training and leak your IP.
I’ve seen parts of the Congress testimony of Sam Altman amd it felt like the guy was on a power trip with his regulation ideas.
What he accomplished with OpenAI is for the history books, but wanting monopoly over AI development in the US makes him look like an up and coming villain.
Nothing that he said in Congress supports the claim that he wants monopoly over AI development.
I find it fascinating how people are so happy to turn argumentation on it's head, and pretend it's the same thing. Surely you must know you are doing it? But what is to gain, I do not know.
Asking for regulation soon after your company has gained a substantial (but probably transient) lead in a market sets off people's villain detectors, especially people who have seen the same things before. Sam may not be a villain and may believe in everything he is advocating for pure reasons, but it is right for people to be critical and skeptical. Sam will be okay, he can handle some haters.
> Asking for regulation soon after your company has gained a substantial (but probably transient) lead in a market sets off people's villain detectors
This doesn't make any sense to me.
a) OpenAI just barely has a product. I mean it's pretty amazing but so very clearly just emerging. Regulation increases the barrier to market entry for other players, but OpenAI entire continued research, that is just about to maybe bear fruit, depends on upcoming regulations not fucking them over.
b) Right now, the lynchpin (and how this industry could realistically be regulated) seems to be a compute. For a lot of compute, you need a lot of money.
So here is what we are assuming: An evils genius OpenAI, that is masterfully steering American regulation to create a scenario in which, on the one hand, the most scary competitors (those with the big credit lines) will be effectively kept from entering the market, while on the other hand warding off any AI policies that run a high risk of impacting their own business more than anything else.
You claim people have seen such things before. I am not sure what you are referring to, but to me the above sounds exceedingly wild and improbable.
> Sam will be okay, he can handle some haters.
Sure. I really don't care about Sam. I just wish people tried to be a little kinder, for everyone.
As I understand it, most people's objection to the calls for regulation are based on the fact that regulation raises the barriers to entry for competition. Regulatory compliance can be expensive and favors those with deep pockets. The more AI startups one can prevent from starting/growing now, the fewer competitors there will be in the future.
All of which is entirely reasonable and so is considering all of Samas possible motivations.
But just declaring another parties intentions as you see fit is such an incredibly bad style in any discourse. When filling the gaps of your knowledge, there has to be a world in which Sama is in fact a good human being – in addition to the world where he is not – when that assumption does not logically conflict with anything that he is proposing.
His argumentation for regulation is coherent. The concept of regulation is not complicated or new, it requires no good faith, it makes obvious sense for obvious reasons.
This does not mean that those reasons are what Sama motivated to make this proposal. But when being able to string together an argument is all that is required to forgo any other possible arguments, we are in deep trouble. We need to be able to consider intentions with uncertainty and stay attentive to what is being said and done.
There's one obvious pink elephant in safety that makes it very difficult for me to perceive positive intentions. What is a bigger threat to "safety" from AI? You and I - society society at large, or an organization whose entire purpose for existence is to think up brand new ways to violently impose their will on the rest of the world - and have a long history of showing minimal to zero moral constraints in pursuit, and execution, of those ends? Let's not forget to also factor in near to zero laws meaningfully constraining their behavior, and a trillion dollar annual budget to cook up new forms of violence.
Yet all the talk about "safety" is about keeping the chatbot away from you and I, while completely ignoring the real threat. And of course the real threat will always be granted access to whatever AI system it demands, because of those magical words that can make the sun rise in the West and 2+2 sum to 5: national security. And the version they're using isn't going to be responding with "As an AI language learning model I cannot..."
The GP does not say anything about Sam Altman’s motivation specifically, but rather addresses the argument about regulation. The intent is irrelevant in this context and therefore there would be no need to superimpose intents to address the argument, which is that regulation impedes competition. This is already a very fecund discussion topic.
Guessing the true motivations of the rich and powerful is an essential element of democracy, because they never reveal their intentions.
This is also why even countries with stricter libel laws than the U.S. make broad exceptions for public figures. If you can't speculate about motivations in public, you'll have a dictatorship soon.
He wants for Congress to create or use an existing agency to regulate AI with input and regulation ideas from him. And right after that he started enumerating some things that any other AI company should do. For me this points to the idea he want to control AI development at a national level.
Changed? It's the same song as before they were famous.
> We’ve co-authored a paper that forecasts how malicious actors could misuse AI technology, and potential ways we can prevent and mitigate these threats. This paper is the outcome of almost a year of sustained work with our colleagues at the Future of Humanity Institute, the Centre for the Study of Existential Risk, the Center for a New American Security, the Electronic Frontier Foundation, and others.
There's a good chance you're right about Altman's motivations. This doesn't make the case for regulation weaker (or stronger). There are bigger things at play than OpenAI's financial forecasts, even if Altman's motivations are selfish and narrow.
sure, one may very well be on the mark regarding Altman's innermost drives. but surely you do realize this revelation does not, in any way, diminish or amplify the argument for oversight. Greater forces are in motion, outstretching the limited realm of OpenAI's fiscal prognostications. Even if we were to consider Altman's motivations as self-serving and myopic, these grand machinations unfolding on the canvas of our society remain undeterred.
I think there's zero chance that OpenAI won't create a full-scale vector solution in the near future. The market is too big for them to ignore the need for people to upload and query their own data. Which, as others pointed out, might be the reason they wanted to remove this because Sam said, "We have no plans to expand beyond the UI.".
If you look at Pinecone, their Standard plan is $70 a month. Like, OpenAI already does some memory management in the interface - it makes no sense to ignore the earnings potential with this, which would give them more runway to spend on training future models.