Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It would be very bad for the tech progress in the West if these types of lawsuits succeed, while China can do this. The legislature would have to act to explicitly legalize it and that could take years and it could be too late by then. The US is already falling behind China in the AI progress.


The argument of "the west" vs "china" is a bad one.

"US company sued for underpaying employees" -> "It would be bad if labor laws are enforced here, china has lower minimum wage and exploits workers more, we have to be competitive"

I know those situations aren't exactly the same, but you're trying to use that same justification, that we should think about our laws in terms of a competition with china, in terms of AI progress, not in terms of the arts or workers or happiness.


What you have stated doesn’t discredit the argument at all, it merely points out that there are additional things possibly worth considering, which is basically always true.


The only thing it would mean is that companies would have to pay for their training datasets like they already do for many applications of ML, instead of freeloading off of the work of millions of developers.

Hell, I am sure there are plenty of developers who wouldn't give a shit if Microsoft trained Copilot on their code for free as long as Microsoft abided by the terms of their licenses.

This is like arguing against ending forced prison labor because China will still use it anyway even if we stop. Why do anything if the looming spectre of China is enough to capitulate to their status quo?


Show me one example of Copilot output that you would like protected under a non-open source license owned by Microsoft.

Do you not understand that if a court rules that little snippets of utilitarian code are somehow considered artistic expression that it will basically be impossible to write any software that isn’t infringing on something owned by a large corporation with in-house legal?


That's already the case, hence why clean room reverse engineering exists.

It's not illegal to come up with code that is exactly the same as an existing piece of copyrighted code.

It is illegal to take copyrighted code and reproduce and distribute it against its license, however.

It's the difference between original writing and plagiarism. You might come up with a substantially similar, to exact, point that other people have made. That's not illegal. Copying those points from a book verbatim is a copyright violation, however.

Copilot is reproducing copyrighted code from its training set. That's no different than you reading the leaked Windows source code and then copying it to ReactOS or WINE. But if you came with the same solution Windows developers happened to come up with, and it's just a coincidence, that's just fine.


> It's not illegal to come up with code that is exactly the same as an existing piece of copyrighted code.

That's not why it is not a copyright infringement to come up with code that is exactly the same as a piece of code that is the same as some other code.

"In computer programs, concerns for efficiency may limit the possible ways to achieve a particular function, making a particular expression necessary to achieving the idea. In this case, the expression is not protected by copyright."

https://en.wikipedia.org/wiki/Abstraction-Filtration-Compari...

It's because the code in question is most likely to only be written in one way.

If you happen to come up with the same melody and lyrics as a pop song in a "clean room" do you think you are in the clear? We're talking about copyright! It is meant to cover artistic expression! Not utilitarian inventions.


So the things that are covered by copyright in software are the creative choices... like, the overall structure of the code, the specific classes, and interfaces, etc. Like, there's a zillion ways to organize code (certainly some better than others!) and this is where clean-room design comes into play. It is really unlikely that in a decent sized codebase that the classes, or types, or whatever are going to be substantially similar.


You're both right but only one of you is seated in reality, the other describes the reality we aught to be in, but are not.


While I know I am right about the current interpretation of how copyright applies to software I am also of the opinion that this is also how things aught to be.

I have yet to hear a persuasive argument otherwise.

Is it because I both write and publish open source code and write and publish music that I am able to clearly delineate idea from expression?

For one, I have never been satisfied with the non-utilitarian aspects of Copilot’s output. When I am writing software the real art has always been in how code is organized. I gain absolutely no aesthetic value from autocompleting unit tests or boilerplate.

You may think that the outputs of Copilot and Stable Diffusion are artistic in nature but all I see is a rhyming dictionary.

I look at attempts to have copyright cover the utilitarian aspect of software as an attempt to claim ownership over chord progressions, scales or time signatures.


I don't follow. Aside from the ignoring of all ethical and legal concerns built up in multiple societies to this point - if Microsoft loses this type of lawsuit, why would AI not continue to be researched? Don't they just have to build in attribution into the tool and everything's legal again?


Terms of many licenses require more than just attribution, but also the inclusion of the copyright notices and licenses verbatim. MIT, BSD, MPL, GPL, LGPL, AGPL, etc all require those at minimum.

Then there is the application of the licenses themselves. GPL licensed code stipulates that its inclusion in a new work means that the new work must be licensed under the GPL, as well. AGPL and LGPL have similar terms, but with different stipulations.


Indeed, not to mention that you cannot just mix licenses. The vast majority of licenses are not compatible. It just so happens that some popular ones are, but in general the output is either GPL-like, MIT-like, or incompatible.


AFAIK copilot suggestions are sourced from a model that is trained on many sources. Presumably when it claims a solution (an autocomplete suggestion), it would be using data taken from dozens, maybe hundreds of examples. How does a person even provide attribution in this case?

It's like saying, I know 'polo' comes after Marco not only because I was at your pool party where we all signed NDAs, but also because I've been to dozens of other pool parties. How do I know where to attribute this knowledge? In part that knowledge is due to the prevalence of examples and not just any single example.


I'm not in ML, can the program not know where its suggestions are drawn from?


Depends, could be a no if copilot identifies solutions by looking for consensus matches.

To illustrate, google translate got suddenly better some time ago when it searched literary texts for phrase matches and then cross-referenced known translations to give you a translation.

Copilot is perhaps doing something similar (somewhat as an analogy) copilot could maybe be finding exactly one sample that it suggests, in which case there could be attribution.

It is unlikely to offer any solution without a consensus. To further the example, how can you be confident in that translation? What if there is not just one literary match, but many thousand and 99.9% agree.

Without that high match percentage, it would be difficult to know if the result is a accurate, or with a small set of disagreeing matches, again it would. E hard to give an accurate answer. I don't know for sure, though seemingly copilot would have some threshold of agreement across multiple matches before it suggests a solution


If a court determines that their behavior is illegal, this argument sounds a lot like: "Let us break the law, because someone else is doing much worse."

If it is necessary to reform how we handle copyright and IP licensing to remain competitive, we should find ways to do that.

The law should apply equally to everyone, whether they are competing with Chinese companies or not.


Should West dismantle copyright and let everyone use leaked propritary software source code too? Or allow to break license of source-available software like Unreal Engine? Or it's only apply to open source software?

After all in China it happen all the time.


Unironically, yes.


Microsoft can't use their own code to train the model? Researchers can't use OSS to research (but not monetize)?

This isn't progress anyway. It's a fancy autocomplete/plagiarism/stealing dressed up as AI.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: