The big one is 175 billion parameters. With your hardware's usual 32 bit floats,...

p1esk · on May 30, 2020

This one uses FP16, so you just need to have a server with >350GB of RAM. 512GB of DDR4 would set you back around two grand. A total cost of a server for this would probably be under $5k. Comparable to a good gaming rig.

sillysaurusx · on May 30, 2020

A TPU can allocate 300GB without OOMing on the TPU's CPU. That's tantalizingly close to 350GB. And 300GB + 8 cores * 8GB = 364GB.

It'll take some work, but I think I can come up with something clever to dump samples on a TPUv2-8. i.e. the free one that comes with Colab.

Realistically, I don't think OpenAI will release the model. Why would they? And I'm not sure they'd dare use "it might be dangerous" as an excuse.

p1esk · on May 31, 2020

Have you (or anyone) tried running GPT-2 inference in INT8 precision? Perhaps worth looking at one of these efforts: https://www.google.com/search?q=running+transformer+in+int8&...

6gvONxR4sf7o · on May 31, 2020

It uses FP16 yes, but the question was about average people running them on their PCs. I don't think most PCs have fp16 support, so you'd have to do it in fp32, doubling the size. It's likely not so fast on a CPU either with that size, especially when using FP32.

p1esk · on May 31, 2020

The ALUs (FPUs) in most CPUs are 64 bit (even more than that internally), but this does not matter, because we don't care how many bits our floats take inside the CPU, we care about how much space they take in our server's RAM. From our point of view, we supply weights and inputs to the CPU (both in FP16), CPU multiplies them (using 64 bit multipliers), and then spits out the result, which is cast to FP16, and that's what gets stored in memory.