What's it take to actually run a model like this, hardware-wise? I've been toying around with a gpt2 discord bot (https://github.com/ScottPeterJohnson/gpt2-discord) using just a CPU calculation, and already it takes up 2 GB RAM (and is slow obviously) on the 345M model. I might be able to get the 774M model running, but there's no way I can afford the full model, assuming linear RAM use. And that's just for CPU compute, I can't even begin to imagine how expensive GPU would be.
I use it for http://nametango.ai/ (a startup/product name generation assistant). There are more details on my page, but in a nutshell, it takes about 7 seconds to generate a result on CPU (I generate a batch of 40 results for every query and de-dup them), and it runs in about 300 milliseconds on a Titan V GPU.
That is, however, a very batch friendly application, so a serial generation may perform somewhat better on CPU only.
(I'm comparing to dual xeon gold 16 core CPUs). When using the GPU to generate the GPT-2 results, the CPU is mostly bored.
(responding to a comment that someone deleted, yes, it does suggest a lot of existing names, I haven't yet added something to filter out stuff that already exists... noted, and suggestions appreciated, but this probably isn't the thread for it! Feel free to drop me a note)
Inferencing on this model works fine on Google Colab which gives Tesla K80 GPU with access to 12GB of GPU RAM. You can buy a used K80 for probably about $850, but it's not really ideal for putting in a home computer because of the cooling requirements.
If I remember correctly K80 memory is actually 24GB, not 2x12GB. This is a pretty important distinction in this context (training GPT-2).
Also, you can get at least 6 K80s for the price of a single RTX Titan (also 24GB). So it would be faster (I don't think RTX Titan is 6x faster than K80) and 6x more memory for the same price. It's a very good deal.