Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have 2x 3090 do you know if it's feasible to use that 48GB total for running this?


Yes, it runs totally fine. I ran it in Oobabooga/text generation web ui. Nice thing about it is that it autodownloads all necessary gpu binaries on it's own and creates a isolated conda env. I asked same questions on the official 70b demo and got same answers. I even got better answers with ooba, since the demo cuts text early

Ooobabooga: https://github.com/oobabooga/text-generation-webui

Model: TheBloke_Llama-2-70B-chat-GPTQ from https://huggingface.co/TheBloke/Llama-2-70B-chat-GPTQ

ExLlama_HF loader gpu split 20,22, context size 2048

on the Chat Settings tab, choose Instruction template tab and pick Llama-v2 from the instruction template dropdown

Demo: https://huggingface.co/blog/llama2#demo


Is there any specific settings to make 2x3090 work together?


Not really? I just got those cards in separate PCI slots and the Exllama_hf handles spreading the load internally. No NVLink bridge in particular. I use the "20,22" memory split so that the display card has some room for the framebuffer to handle display


Do you mean you don't use NVLink or just use one that works? I am under the impression it is being phased out ("PCIe 5 is fast enough") and some kits don't use it.


I don't use NVLink


Interested in this too


I'm very curious what your other components are and how you managed to fit 2 3090s in one PC.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: