The applicability of various open source licenses relies on the fact that without accepting the license, the act of distributing software would be a violation of copyright law as that is an exclusive right of the copyright holder and requires their permission (i.e. license). Anyone can refuse the conditions of the GPL (just as any other contract), it's just that without accepting the GPL license they aren't allowed to redistribute their version of the software, and you can sue them for copyright infringement.
"machines learning from the code" is not like that - this is not an exclusive right awarded to the authors (quite the opposite, quoting copyright law, "In no case does copyright protection for an original work of authorship extend to any idea, procedure, process, system, method of operation, concept, principle, or discovery, regardless of the form in which it is described, explained, illustrated, or embodied in such work.") and it does not require the permission of the copyright owner. If I have a legitimately obtained copy of some copyrighted work, no matter if it's a book, audio recording or code, and I don't have any contractual restrictions, I'm free to train a ML model on it. And if an open source license like you propose would create such contractual restrictions, I don't need to enter that contract, because I don't need a license for this.
The BSD 2-Clause License appears to attach conditions to both "redistribution" and "use". Quote:
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
Is that a mistake in the BSD license? I don't think so. I think you are mistaken.
See the discussion here "Can a software license impose restrictions on the place where the software is to be used, so that a court would enforce those restrictions?"
There are numerous examples there of licenses restricting use: Apple's licenses, the Unreal Engine, etc.
License Restrictions Sample Clauses:
License Restrictions. Licensor reserves all rights not expressly granted to You. The Software is licensed for Your internal use only. Except as this Agreement expressly allows, You may not (1) copy (except for back-up purposes), modify, alter, create derivative works, reverse engineer, decompile, or disassemble the Software except and only to the extent expressly permitted by applicable law; (2) transfer, assign, pledge, rent, timeshare, host or lease the Software, or sublicense any of Your license grants or rights under this Agreement; in whole or in part, without prior written permission of Licensor; (3) remove any patent, trademark, copyright, trade secret or other proprietary notices or labels on the Software or its documentation; or (4) disclose the results of any performance, functional or other evaluation or benchmarking of the Software to any third party without the prior written permission of Licensor. Hosting Restrictions. In the event that You desire to have a third party manage, host (either remotely or virtually) or use the Software on Your behalf, You shall (1) first enter into a valid and binding agreement with such third party that contains terms and conditions to protect Licensor’s rights in the Software that are no less prohibitive and/or restrictive than those contained in this Agreement, including, without limitation, the Verification section below; (2) prohibit use by such third party except for the sole benefit of You; and (3) be solely responsible to Licensor for any and all breaches of the above terms and conditions by such third party.
If a license can prohibit decompilation or copying, we can obviously prohibit language model training.
And that's what we need to do, for the reasons stated here:
It's not uncommon for licenses to try to assert overbroad conditions, just as a discouragement and also a way to ensure that even if they are not relevant in some jurisdictions, they stick elsewhere, and the license you quote is a good example of that. A license or a contract saying something does not make it true (especially so in civil law jurisdictions - I've seen contracts and terms&conditions where the majority of clauses are absolutely void because they contradict relevant law), and of course the validity of the contract also is relevant (e.g. while in USA shrink-wrap licenses may be considered valid contracts, in much of the world they are not binding).
It does not prohibit decompilation, although it tries to do that. All it says it that the licensor does not grant me the right to decompile or disassemble the Software. It does give a nod to "except and only to the extent expressly permitted by applicable law" which is the key part (and would be valid even if they did not say it), because the applicable law (at least for me) does grant me the right to decompile and disassemble the software for various purposes without the permission of the copyright owner, i.e. this license does not actually prohibit decompilation, no matter what it says.
It's a broad clause which relies on the fact that some types of computer software "use" may require permission of the copyright owners, depending on jurisdiction - in essence, I'd say that the validity of this restriction depends on how the specific law treats the incidental copies of the software created as it is being installed, executed, etc; this (unlike most parts of copyright principles aligned in international conventions) isn't universal globally.
My position is mostly based not on code but on text, as in natural language processing there is a similar but much older situation of models being trained on copyright-protected work, the interests of researchers and publishers obviously differ, and at least currently (laws do change) the legal position is that publishers' requirements can be (and are) ignored, as models can be trained on these texts without their permission and even after their explicit objections / cease and desist requests. And a BSD or GPL or some other license can't do anything more restrictive than the book "license" of "all rights reserved, we don't grant you any permissions".
Please refer to the material I provided. Clear examples of use being restricted (decompilation, disassembly, copying, publication of performance measurements, etc.)
A license can restrict use, and we need to do exactly that (restrict use in training models and in inference) to address this new kind of threat to intellectual property rights.
I suspect those materials were read carefully. The gist of the rebuttal is that an author cannot reserve rights they do not have and licenses "can" claim restrictions that they are not able to actually claim (depending upon local law)
"machines learning from the code" is not like that - this is not an exclusive right awarded to the authors (quite the opposite, quoting copyright law, "In no case does copyright protection for an original work of authorship extend to any idea, procedure, process, system, method of operation, concept, principle, or discovery, regardless of the form in which it is described, explained, illustrated, or embodied in such work.") and it does not require the permission of the copyright owner. If I have a legitimately obtained copy of some copyrighted work, no matter if it's a book, audio recording or code, and I don't have any contractual restrictions, I'm free to train a ML model on it. And if an open source license like you propose would create such contractual restrictions, I don't need to enter that contract, because I don't need a license for this.