Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Too many googleable questions

To be fair (as one who hasn't paid too much attention to these developements in the last few months/years), the sheer amount of individual facts that are apparently encoded in a fixed nn architecture is what amazes me the most here?

I get that it has 175B variables, but it's not like you can used them in a structured way to store say a knowledge base...



I see this "175B parameters" figure tossed around a lot: what does that mean? Does the model literally consist of 175 billion 32-bit floats?

(Sorry for the very basic question.)


You would be correct.

It's ~650GiB of data. The entire English version of Wikipedia is about 20GiB and the Encyclopædia Britannica is only about 300MiB (measured in ASCII characters) [1].

[1] https://en.wikipedia.org/wiki/Wikipedia:Size_comparisons


The English Wikipedia text (no images) is about ~20 GB compressed, but ~60 GB uncompressed.


I only use the sources available to me at the time.

Another value published by Wikipedia is 30GiB [1] as of 2020, which includes punctuation and markup.

I explicitly put the measurement unit as ASCII characters. If you have a better source for your size (remember: ASCII characters for article words only, no markup), feel free to post it.

[1] https://en.wikipedia.org/wiki/Wikipedia:Size_in_volumes




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: