Is there some sort of "LLM-on-Wikipedia" competition? ie: given "just wikipedia"...

bionhoward · on March 28, 2024

Yes, that’s known as the Hutter Prize http://prize.hutter1.net/

ramses0 · on March 28, 2024

Not exactly, because LLM's seem to be exhibiting value via "lossy knowledge response" vs. "exact reproduction measured in bytes", but close.

AnotherGoodName · on March 29, 2024

Lossy and lossless are more interchangeable in computer science than people give credit so i wouldn't dwell on that too much. You can optimally convert one into the other with arithmetic coding. In fact the actual best in class algorithms that have won the hutter prize are all lossy behind the scenes. They make a prediction on the next data using a model (often AI based) which is a lossy process and with arithmetic coding they losslessly encode the next data with bits proportional to how correct the prediction was. In fact the reason why the hutter prize is lossless compression is exactly because converting lossy to lossless with arithmetic coding is a way to score how correct a lossy prediction is.

CraigJPerry · on March 28, 2024

>> Or does the training truly _NEED_ every book every written + the entire internet + all knowledge ever known by mankind to have an effective outcome?

I have the same question.

Peter Norvig’s GOFAI Shakespeare generator example[1] (which is not an LLM) gets impressive results with little input data to go on. Does the leap to LLM preclude that kind of small input approach?

[1] link should be here because I assumed as I wrote the above that I would just turn it up with a quick google. Alas t’was not to be. Take my word for it, somewhere on t’internet is an excellent write up by Peter Norvig on LLM vs GOFAI (good old fashioned artificial intelligence)