It will generate a correct next token 42% of the time when prompted with a 50 to...

j16sdiz · 2025-06-16T03:27:49 1750044469

next _50_ tokens 42% of the time

not just next token.

This is like: tell it a random sentence in the book, it will give you the next sentence 42% of time.

deviation · 2025-06-15T13:13:13 1749993193

A... massive distinction.

amlib · 2025-06-16T21:26:36 1750109196

That said, I bet that if you could lower the inference temperature such chances would improve by a lot.

asplake · 2025-06-15T13:34:59 1749994499

“… well enough to reproduce 50-token excerpts at least half the time”

chiph2o · 2025-06-15T18:12:36 1750011156

This means that if we start with 50% of the book then there is 42% chance that we can recreate the remaining 50%.

What is the distinction between understanding and memorization? What is the chance that understanding results in memorization (may be in case of humans)?

ipaddr · 2025-06-16T05:50:11 1750053011

It stores how often characters will come next based on how often they happen in copyright material. It can reproduce parts because those values are a fingerprint.

It should break copyright laws as written now but too much money involved.