This one uses FP16, so you just need to have a server with >350GB of RAM. 512GB of DDR4 would set you back around two grand. A total cost of a server for this would probably be under $5k. Comparable to a good gaming rig.
It uses FP16 yes, but the question was about average people running them on their PCs. I don't think most PCs have fp16 support, so you'd have to do it in fp32, doubling the size. It's likely not so fast on a CPU either with that size, especially when using FP32.
The ALUs (FPUs) in most CPUs are 64 bit (even more than that internally), but this does not matter, because we don't care how many bits our floats take inside the CPU, we care about how much space they take in our server's RAM. From our point of view, we supply weights and inputs to the CPU (both in FP16), CPU multiplies them (using 64 bit multipliers), and then spits out the result, which is cast to FP16, and that's what gets stored in memory.