Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It will generate a correct next token 42% of the time when prompted with a 50 token quote.

Not 42% of the book.

It's a pretty big distinction.



next _50_ tokens 42% of the time

not just next token.

This is like: tell it a random sentence in the book, it will give you the next sentence 42% of time.


A... massive distinction.


That said, I bet that if you could lower the inference temperature such chances would improve by a lot.


“… well enough to reproduce 50-token excerpts at least half the time”


This means that if we start with 50% of the book then there is 42% chance that we can recreate the remaining 50%.

What is the distinction between understanding and memorization? What is the chance that understanding results in memorization (may be in case of humans)?


It stores how often characters will come next based on how often they happen in copyright material. It can reproduce parts because those values are a fingerprint.

It should break copyright laws as written now but too much money involved.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: