Models are not word for word copies of large sections of text. They are capable ...

int_19h · 2025-06-24T23:05:20 1750806320

> When the encoding acts as a bulk archive, does the responsibility shift to those who choose what to extract from the archive?

If you take many gigabytes of, say, public domain music, and stick them on a flash drive with just one audio file that is an unlicensed copy of a copyrighted song, distributing that drive would constitute copyright infringement, quite obviously so. I don't see why it'd matter what else the model can produce, if it can produce that one thing verbatim by itself.

(If you could only prompt the model to regurgitate the original text with a framing of, say, critical analysis of said text around it, and not in any other context, then I think there would be a stronger fair use argument here.)

Retric · 2025-06-16T15:11:49 1750086709

> Is the encoding itself an infringement

Barring a fair use exception, yes.

From what I’ve read MP3’s get the same treatment as cassette tapes which were also lossy. It’s 1:1 digital copies that represented some novelty, but that rarely matters.

I’m hesitant to comment of the rest of that. The ultimate question isn’t if some difference exists but why that difference matters.