> What use cases do people have for these smaller LLM's? None. Training a functi...

wokwokwok · on June 29, 2023

yep, that’s why it’s free.

If it was good, then they’d charge for it.

…what they (and everyone) is gonna do is play with smallish models to iterate on the process for relatively small expense and earn karma.

Then pay big $$$ to make a really good model for internal use and/or an api that people have to pay for.

Tldr; it’s free. It’s by sales force. You should expect it to be a) crippled and b) a loss leader for a paid product.

Not judging; it’s a fair strategy. Just saying: salesforce is not a company that just gives hundreds of thousands of dollars away for nothing.

If you want a good free open model, you’re kidding yourself if you think a corporate giant is going to kiss you on the head and give it to your for free.

jrflowers · on June 29, 2023

> If you want a good free open model, you’re kidding yourself if you think a corporate giant is going to kiss you on the head and give it to your for free.

Yep! That makes sense!

I would love to be the CEO of the company that does give away an actually useful model and little forehead kisses though. The amount of goodwill that one would generate from that would be astronomical and training costs are getting so low that nearly any company with enough cash could do it.

I look forward to waking up and hearing that the Nabisco/Canadian Tire/A&W usefully-tuned model is revolutionizing the economy and seeing the infinite amount of good press that it would generate.

fragmede · on June 29, 2023

OpenAI did this when releasing Whisper, but I mostly hear sneers about how they're not really open, and no gratitude for the "little kiss". Given that, I don't know that as CEO, I'd be very benevolent.

ilaksh · on June 30, 2023

Not quite the same but that's what Stable Foundation did with stable diffusion.

AuryGlenz · on June 29, 2023

To be fair, Stable Diffusion (especially the upcoming SDXL) are good and free.

visarga · on June 29, 2023

I don't agree. LLaMA models are great if you want to run your own models on your own systems, but only if you fine-tune them to specific tasks. The problem is that LLaMA is non-commercial. There was a need for a small efficient pre-trained model to build on. This is what Salesforce released. It's not intended to be used with general purpose prompting like chatGPT.

The problems with chatGPT are many - dependence on third party, privacy, externally imposed ideology and rules, cost, and most importantly - prompting is context-size limited and token-expensive, you can't pack much data into it.

Fine-tuning is a more powerful approach where you can actually fix the model problems instead of futzing around with the prompt and demonstrations. Yes, you got to work on your dataset. But if you don't already have it you can bootstrap with GPT-4 for a small sum.

Meta provided the training wheels - LLaMA, every company tried fine-tuning it for their purposes, but could not proceed for lack of a commercial base model. Salesforce XGen and a few other open-small-LLMs (funny how that sounds!) open the flood gates.

So the recipe is: use an existing dataset, or make one with regular GPT-4 prompting and a bit of curation. Then fine-tune a small open model. You can get it to be better than stock GPT-4, cheaper, faster and private. If you use LoRA's you can save each skill in a separate diff model just 1% the size of the base model and use a single GPU to fine-tune it, in a single day.

basch · on June 29, 2023

If your main product is a business tool to spam people, it would be in your best interest to enable as many new businesses to sprout up and start spamming people as possible. It would also be in your best interest to prevent your competitors from creating that enablement product and selling it as a service, earning revenue, and pumping that money back into their spam product.

These free models are both a defensive move against behemoths, and kindling to rapid business development.