The burden of proof is completely uncharted when it comes to LLMs. Burden of proof is assigned by court precedent, not the Copyright Act itself (in US law). Meaning, a court looking at a case like this could (should) see the use of an LLM trained on the copyrighted work as a distinguishing factor that shifts the burden to the defense. As a matter of public policy, it's not great if infringers can use the poor accountability properties of LLMs to hide from the consequences of illegally redistributing copyrighted works.
1. Initially, when you claim that someone has violated your copyright, the burden is on you to make a convincing claim on why the work represents a copy or derivative of your work.
2. If the work doesn't obviously resemble your original, which is the case here, then the burden is still on you to prove that either
(a), it is actually very similar in some fundamental way that makes it a derived work, such as being a translation or a summary of your work
or (b), it was produced following some kind of mechanical process and is not a result of the original human creativity of its authors
Now, in regards to item 2b, there are two possible uses of LLMs that are fundamentally different.
One is actually very clear cut: if I give an LLM a prompt consisting of the original work + a request to create a new work, then the new work is quite clearly a derived work of the original, just as much as a zip file of a work is a derived work.
The other is very much not yet settled: if I give an LLM a prompt asking for it to produce a piece of code that achieves the same goal as the original work, and the LLM had in its training set the original work, is the output of the LLM a derived work of the original (and possibly of other parts of the training set)? Of course, we'll only consider the case where the output doesn't resemble the original in any obvious way (i.e. the LLM is not producing a verbatim copy from memory). This question is novel, and I believe it is being currently tested in court for some cases, such as the NYT's case against OpenAI.
On the other hand, as a matter of public policy, nobody should be able to claim copyright protection for the process of detecting whether a string is correctly formed unicode using code that in no material way resembles the original. This is not rocket science.
It's gotten ok now. Just spent a day with Claude for the first time in a while. Demanded strict TDD and implemented one test at a time. Might have been faster, hard to say for sure. Result was good.
It's probably not unrealistic that a programmer who learns Vim well could be, say, 2x more productive in Vim than in, say, Nano.
Yet programmers who have used Nano were not (at least not significantly) scoffed at or ridiculed. It was their choice of tool, and they were getting work done.
It seems unclear how much more productive AI coding tools can make a programmer; some people claim 10x, some claim it actually makes you slower. But let us suppose it is on average the same 2x productivity increase as Vim.
Why then was using Vim not heralded from every rooftop the same as using AI?
It's funny, because this decision by Joel in 2006 prefigures TypeScript six years later. VBA was a terrible bet for a target language and Joel was crazy to think his little company could sustain a language ecosystem, but Microsoft had the same idea and nailed it.