No they weren't able to generate the same existing code, both because that code is not included anywhere in the model, and because Copilot (not "Codepilot") has safeguards against this kind of situation, should it arise in the highly unlikely situation that a snippet is repeated thousands of times across thousands of repositories.
I've gotta let you know that people copy code snippets from all sorts of codebases with little regard for licenses anyway, because they're toothless in 99% of cases, AI or not. It's a nice illusion that anyone respects licenses, but it's just not true.
I've spent hours looking over code before delivering to FAANG. Our company had put a clause into the contract that our code was free of any GPL'd code. It happened before and it was discoved. The whole thing was a very expensive excersice. I'm aware that many small startups, 90% of which go bust anyways, just ignore licenses but that doesn't work when you play with the big boys.
Have to say I wasn’t expecting “antisemitism is fine” as a counter argument to downvotes. Suffice to say this comment isn’t the slam dunk you think it is.
Yes it is, and I don't care about how popular my comments are. In many slavic countries antisemitism is the default and normal opinion. Semitic people are not the sacred cows they are in american culture.
Which Slavic countries, specifically? I come from one, and even though there are (borderline) antisemitic jokes (an X, Y and Jew go into a bar, or Holocaust-related jokes which i personally choose to interpret as just a specific brand of dark humour), antisemitism is not a "default and normal opinion". It's skinhead/football hooligan fringes.
So, which Slavic country has a majority of antisemites, today? The horrors of WW2 have taught all of us better, one would think (idiots such as yourself being the exception, not the norm; clarification: yes, you have to be an idiot to be antisemitic or racist or any other similar description).
A simple search on github reveals that those functions were reposted verbatim thousands of times, most people just copy and paste snippets of code they find useful, ignoring licenses. This highlights how all the power a license promises to hold is completely fictional. Any "in the style of Tim Davis" modifier only shows some kind of unwarranted self-importance complex on the part of the guy, thinking his style is widely known and distinctive (it's not). It's not the job of Copilot, the team that builds it, or the programmers that use it, to determine where the functions that were reposted thousands of times under all kinds of licenses originated.
This is the same case as with copyrighted photos in newspapers, a paper prints a photo somebody allowed them to use, but then it turns out that person did not have the right to use it in the first place. Did not stop newspapers from printing photos at all.
It does not show an unwarranted sense of self-importance on the part of Tim Davis.
Whether or not his style his widely known, his code is VERY widely used. Just look up SuiteSparse and try to find all of the downstream uses of it. It is one of the most---if not the most---ubiquitously used set of sparse linear algebra libraries. If you do anything with numerical linear algebra, there's a good chance you at least know what SuiteSparse is, and possibly also know who Tim Davis is.
The bigger issue here is the effect this has on research. Tim Davis not only programmed this library, he did the basic research leading to many of the algorithms in SuiteSparse. He went ahead and released SuiteSparse open source, probably thinking that it would be a good deal for him, provided that its use was properly attributed. Provide a public service in exchange for attribution. This is a reasonable way to get support as an academic. Clearly he has had a large number of industrial collaborations which likely have provided him with a significant amount of funding over the years.
Speaking for myself, if Microsoft has no compunction against behaving this way, I can no longer see the point in publicly releasing research code that I develop using an open source model. Microsoft is clearly telegraphing that they don't give a f** about licensing, although whether that holds if they are litigated against remains to be seen. I think there's an excellent chance many other researchers feel the same way. If you think openness and reproducibility in science is important, this is a problem.
If you think most people pay any attention to licenses or respect them you better think again. Snippets get copied verbatim with no regard to their source all the time. Licenses have no power and are routinely ignored.
Mmm maybe rephrase that as “depending upon which entity’s copyright was violated”
Surely I don’t need to recite the last 50 years of tech legal precedent and case history for you to see that such a blanket generalization cannot be left unaddressed.
It prints this code because you have it open in another editor tab. Wish people who don't know at all how it works stopped acting all outraged when they're laughably wrong.
> OpenAI Codex was trained on publicly available source code and natural language, so it works for both programming and human languages. The GitHub Copilot extension sends your comments and code to the GitHub Copilot service, and it relies on context, as described in Privacy below - i.e., file content both in the file you are editing, as well as neighboring or related files within a project. It may also collect the URLs of repositories or file paths to identify relevant context. The comments and code along with context are then used by OpenAI Codex to synthesize and suggest individual lines and whole functions.
Depending on your preferred telemetry settings, GitHub Copilot may also collect and retain the following, collectively referred to as “code snippets”: source code that you are editing, related files and other files open in the same IDE or editor, URLs of repositories and files paths.
And - what's the point? Websites that want to track you will track you anyway, there's a whole array of technologies that will let them to, and won't display anything to you if you don't run JS. Security objections are pointless, it's not 2001, JS is not anything new, and it's not at all a security risk, browsers are sandboxed and well isolated.
Purists who refuse to load websites with JS are such a small percentage of visitors they can be safely ignored. It's an uphill battle that they're losing very fast. I don't think you can use more than 5% of websites without JS in any capacity.
The whole point of websites is lost if they can't track visitors, see what they're paying attention to, and how to manipulate their behavior. Everyone's doing it, and if you don't, you're at a disadvantage. There are very few websites whose purpose is not to influence you to spend money on something. Most of the articles and posts you read are AI generated, or written by "content writers" never intended to be read by actual people, they're there for the google bot to keep some activity going.
> Purists who refuse to load websites with JS are such a small percentage of visitors they can be safely ignored.
Except they are not[0]. Besides micro browsers there's screen readers that can't even use sites that customize a bunch of div elements with no accessibility tags.
For most commercial purposes screen readers can be safely ignored, they are not used by big spenders. Social share cards can be previewed readily and easily prepared without impacting the main content of the website.
Microbrowsers do not run JavaScript. That means if all the routing is handled by JavaScript or the only HTML served is a script tag a microbrowser will show nothing. More and more JavaScript monstrosities do not properly show up in share sheets because they offer no meaningful content to a microbrowser.
As a practising JavaScript-decliner for a few years: your “only 5% work” figure is wildly, extremely wrong. It depends a little on what types of sites you’re dealing with, but for contenty sites rather than appy sites, I’d put it past 95%, with a few notable major site exceptions (so that by content it may well be below 95%) and certain subcategories that are more commonly broken.
And I’d say the battle has actually gained some ground in the last eight years, as server-side rendering of JavaScript content stacks has made headway.
The point is that you're not supposed to receive the text by itself, the text is only there to entice you to visit the site, but you're supposed to click links, subscribe to the newsletter, get tracking scripts served to you, and buy whatever product/service they're luring you towards with those blog posts.
Nowadays every marketing person will strongly insist on producing tons of semantically meaningless "blog posts" just to generate some new "content" on the site so that google ranks it higher. The blog very rarely exists for you to just read it, most of the time it's a marketing tool supposed to steer you towards something else, and it's not in their interest to allow you to just read the text without the accompanying cruft.
>spraying an aerosol like sulfur dioxide into the stratosphere
What happened to it being a toxic gas? And is this why they were pushing so hard to completely discredit the notion of "chemtrails" so that all criticism of this idea can be killed easily?
I've gotta let you know that people copy code snippets from all sorts of codebases with little regard for licenses anyway, because they're toothless in 99% of cases, AI or not. It's a nice illusion that anyone respects licenses, but it's just not true.