> What use cases do people have for these smaller LLM's?
None. Training a functionally useless model and releasing it is a great way to demonstrate that your company is hip and current. That way when prospective clients ask about AI you can vaguely gesture at some model that you released and say you employ cutting edge AI experts.
…what they (and everyone) is gonna do is play with smallish models to iterate on the process for relatively small expense and earn karma.
Then pay big $$$ to make a really good model for internal use and/or an api that people have to pay for.
Tldr; it’s free. It’s by sales force. You should expect it to be a) crippled and b) a loss leader for a paid product.
Not judging; it’s a fair strategy. Just saying: salesforce is not a company that just gives hundreds of thousands of dollars away for nothing.
If you want a good free open model, you’re kidding yourself if you think a corporate giant is going to kiss you on the head and give it to your for free.
> If you want a good free open model, you’re kidding yourself if you think a corporate giant is going to kiss you on the head and give it to your for free.
Yep! That makes sense!
I would love to be the CEO of the company that does give away an actually useful model and little forehead kisses though. The amount of goodwill that one would generate from that would be astronomical and training costs are getting so low that nearly any company with enough cash could do it.
I look forward to waking up and hearing that the Nabisco/Canadian Tire/A&W usefully-tuned model is revolutionizing the economy and seeing the infinite amount of good press that it would generate.
OpenAI did this when releasing Whisper, but I mostly hear sneers about how they're not really open, and no gratitude for the "little kiss". Given that, I don't know that as CEO, I'd be very benevolent.
I don't agree. LLaMA models are great if you want to run your own models on your own systems, but only if you fine-tune them to specific tasks. The problem is that LLaMA is non-commercial. There was a need for a small efficient pre-trained model to build on. This is what Salesforce released. It's not intended to be used with general purpose prompting like chatGPT.
The problems with chatGPT are many - dependence on third party, privacy, externally imposed ideology and rules, cost, and most importantly - prompting is context-size limited and token-expensive, you can't pack much data into it.
Fine-tuning is a more powerful approach where you can actually fix the model problems instead of futzing around with the prompt and demonstrations. Yes, you got to work on your dataset. But if you don't already have it you can bootstrap with GPT-4 for a small sum.
Meta provided the training wheels - LLaMA, every company tried fine-tuning it for their purposes, but could not proceed for lack of a commercial base model. Salesforce XGen and a few other open-small-LLMs (funny how that sounds!) open the flood gates.
So the recipe is: use an existing dataset, or make one with regular GPT-4 prompting and a bit of curation. Then fine-tune a small open model. You can get it to be better than stock GPT-4, cheaper, faster and private. If you use LoRA's you can save each skill in a separate diff model just 1% the size of the base model and use a single GPU to fine-tune it, in a single day.
If your main product is a business tool to spam people, it would be in your best interest to enable as many new businesses to sprout up and start spamming people as possible. It would also be in your best interest to prevent your competitors from creating that enablement product and selling it as a service, earning revenue, and pumping that money back into their spam product.
These free models are both a defensive move against behemoths, and kindling to rapid business development.
None. Training a functionally useless model and releasing it is a great way to demonstrate that your company is hip and current. That way when prospective clients ask about AI you can vaguely gesture at some model that you released and say you employ cutting edge AI experts.