That's not really a C implementation of GPT-2 since it cannot be used to do the thing everyone cares about: self-supervised learning from text. In fact, it doesn't even use the weights in the same way GPT-2 does, so it's not clear how close it is to GPT-2's inference mode. The source isn't even on the page.
This is very cool, thanks for sharing! From the readme (https://bellard.org/nncp/readme-gpt2tc.txt), the program benchmarks very comparably to CMIX, which is the top algorithm on the Large Text Compression Benchmark
(http://mattmahoney.net/dc/text.html). I'm guessing that any GPT implementation would be ineligible for the benchmark because of its file size but impressive nonetheless.
I don't disagree that hardware acceleration is key in enabling these models, but I still find it interesting how simple the core techniques are.