You are vastly overestimating the startup cost. For me this week it was literall...

monkmartinez · 2024-07-16T18:08:42 1721153322

llama3 at 8billion params is weak sauce for anything serious, it just isn't in the same galaxy as Sonnet 3.5 or GPT-4o. The smaller and faster models like Phi are even worse. Once you progress past asking trivial questions to a point where you need to trust the output a bit more, its not worth effort in time, money and/or sweat effort to run a local model to do it.

A novice isn't going to know what they need because they don't know what they don't know. Try asking a question to LLaMA 3 at 8 billion and the same question to LLaMA 3 at 70 billion. There is a night and day difference. Sonnet, Opus and GPT-4o run circles around LLaMA 3 70b. To run LLaMA at 70 billion you need serious horse power as well, likely thousands of dollars in hardware investment. I say it again... the calculus in time, money, and effort isn't favorable to running open models on your own hardware once you pass the novice stage.

I am not ungrateful that the LLaMA's are available for many different reasons, but there is no comparison between quality of output, time, money and effort. The API's are a bargain when you really break down what it takes to run a serious model.

jononor · 2024-07-16T22:13:14 1721167994

Using an LLM as a general purpose knowledge base is only one particular application of an LLM. And on which is probably best served by ChatGPT etc.

A lot of other things are possible with LLMs using the context window and completion, thanks to their "zero shot" learning capabilities. Which is also what RAG builds upon.

Aurornis · 2024-07-16T15:54:26 1721145266

I’m familiar with local models. They’re fine for chatting on unimportant things.

They do not compare to the giant models like Claude Sonnet and GPT4 when it comes to trying to use them for complex things.

I continue to use both local models and the commercial cloud offerings, but I think anyone who suggests that the small local models are on par with the big closed hosted models right now is wishful thinking.