There is logic to ensure that copilot does not emit exact duplicates of code in the training set... but that logic is significantly newer than that tweet.
Link? I couldn't find anything "significantly newer" than 7/2/21 (though I'm sure GitHub is doing a lot here). They had this blog post 6/30/21 regarding efforts on avoiding raw code: https://github.blog/2021-06-30-github-copilot-research-recit.... They concluded:
> We will both continue to work on decreasing rates of recitation, as well as making its detection more precise.
Was that decision informed by legal or product? Because derivative works are still derivitative works even if you don't replicate the original verbatim.