Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What's it take to actually run a model like this, hardware-wise? I've been toying around with a gpt2 discord bot (https://github.com/ScottPeterJohnson/gpt2-discord) using just a CPU calculation, and already it takes up 2 GB RAM (and is slow obviously) on the 345M model. I might be able to get the 774M model running, but there's no way I can afford the full model, assuming linear RAM use. And that's just for CPU compute, I can't even begin to imagine how expensive GPU would be.



I use it for http://nametango.ai/ (a startup/product name generation assistant). There are more details on my page, but in a nutshell, it takes about 7 seconds to generate a result on CPU (I generate a batch of 40 results for every query and de-dup them), and it runs in about 300 milliseconds on a Titan V GPU.

That is, however, a very batch friendly application, so a serial generation may perform somewhat better on CPU only.

(I'm comparing to dual xeon gold 16 core CPUs). When using the GPU to generate the GPT-2 results, the CPU is mostly bored.

(responding to a comment that someone deleted, yes, it does suggest a lot of existing names, I haven't yet added something to filter out stuff that already exists... noted, and suggestions appreciated, but this probably isn't the thread for it! Feel free to drop me a note)


Inferencing on this model works fine on Google Colab which gives Tesla K80 GPU with access to 12GB of GPU RAM. You can buy a used K80 for probably about $850, but it's not really ideal for putting in a home computer because of the cooling requirements.

[ deleted reference to 2070 Super ]


Used K80 can be had for $350 [1] Not bad actually (it's probably as fast as 1080Ti, and has 24GB of memory).

https://www.ebay.com/itm/NVIDIA-Tesla-K80-GDDR5-24GB-CUDA-PC...


K80 is 2 GPU chips with 12GB, so it's not always as good as one newer/larger GPU. Much more affordable though :)


If I remember correctly K80 memory is actually 24GB, not 2x12GB. This is a pretty important distinction in this context (training GPT-2).

Also, you can get at least 6 K80s for the price of a single RTX Titan (also 24GB). So it would be faster (I don't think RTX Titan is 6x faster than K80) and 6x more memory for the same price. It's a very good deal.


300w with passive cooling? o-O

How does that work?


You would cool the server.


since when RTX 2070 ship with 14GB of GPU Ram, Max memory for RTX 2070 super is 8 GB.


Oops, you are correct. I mis-read the spec sheet.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: