I have 2x 3090 do you know if it's feasible to use that 48GB total for running t...

eurekin · on July 25, 2023

Yes, it runs totally fine. I ran it in Oobabooga/text generation web ui. Nice thing about it is that it autodownloads all necessary gpu binaries on it's own and creates a isolated conda env. I asked same questions on the official 70b demo and got same answers. I even got better answers with ooba, since the demo cuts text early

Ooobabooga: https://github.com/oobabooga/text-generation-webui

Model: TheBloke_Llama-2-70B-chat-GPTQ from https://huggingface.co/TheBloke/Llama-2-70B-chat-GPTQ

ExLlama_HF loader gpu split 20,22, context size 2048

on the Chat Settings tab, choose Instruction template tab and pick Llama-v2 from the instruction template dropdown

Demo: https://huggingface.co/blog/llama2#demo

zakki · on July 25, 2023

Is there any specific settings to make 2x3090 work together?

eurekin · on July 26, 2023

Not really? I just got those cards in separate PCI slots and the Exllama_hf handles spreading the load internally. No NVLink bridge in particular. I use the "20,22" memory split so that the display card has some room for the framebuffer to handle display

vid · on July 26, 2023

Do you mean you don't use NVLink or just use one that works? I am under the impression it is being phased out ("PCIe 5 is fast enough") and some kits don't use it.

eurekin · on July 27, 2023

I don't use NVLink

kwerk · on July 26, 2023

Interested in this too

olavfosse · on July 26, 2023

I'm very curious what your other components are and how you managed to fit 2 3090s in one PC.