I think copyright is a problem for GPL-like licenses. They should have restricted the training data to MIT/BSD-like.
Anyway, there is another problem that is patents and is huger, much huger. I think the Apache license has a provision about patents, but most of other licenses may have code that has patents and if the AI generate something similar it may be included in the patent.
Anyway, there is another problem that is patents and is huger, much huger. I think the Apache license has a provision about patents, but most of other licenses may have code that has patents and if the AI generate something similar it may be included in the patent.