The main use case is that it's probably the only size consumers can run on their...

m00x · on June 29, 2023

You can run big models on the cloud yourself, or with a 3090/4090 quantized.

You don't have to go to openai.

aryamaan · on June 29, 2023

What’re some models and hardware combos we can run now? I am avoiding to go to OpenAI with my office’s stuff and can use some gpu(s)

MINIMAN10000 · on June 29, 2023

You would just need a computer which can fit 2 3090s in order to run those to run something like TheBloke/airoboros-65B-gpt4-1.3-GPTQ

https://www.reddit.com/r/LocalLLaMA/wiki/models/ gives you a list of VRAM requirements to load the model into GPU VRAM. the more VRAM the computer has, the larger the model you can load in, thus making 3090s the current consumer grade king due to price to max VRAM.

This being said however most models are LLAMA based which all fall under that specific research license.

So following the rules, you would be limited to a subset of models which are foundational models which allow for commercial use

brucethemoose2 · on June 29, 2023

I can easily run LLaMA 13B on my 6GB VRAM/16GB RAM laptop using llama.cpp (specifically Kobold.cpp as the frontend).

I can barely run 33B, but anything more than 800 context and I oom. But it would run very comfortably on a bigger GPU or a 24GB+ laptop.

Theoretically some phones can comfortably handle 13B on mlc-llm though in practice its not really implemented yet.

int_19h · on June 29, 2023

llama-30B (which is actually 33B) and derivatives generally run fine with 4-bit quantization on a single RTX 3090 or 4090, although depending on group size used for quantization you may need to slightly dial down the context size.

jahewson · on June 29, 2023

These are constraints but not a use case.

verwkljdslfklj · on June 29, 2023

Isn't it obvious they're referring to the use case of using a model given those constraints?

roughly · on June 29, 2023

Yes, but I think the responder is wondering if there are useable use cases for that - like, what can you actually Do with that model. I’m in the same boat - I don’t want to ship my data to openai, I do want to run local, so I’d love to hear what other folks are Doing with models of that size.

sidibe · on June 29, 2023

"Use case of using" :)