What you are talking about is Overfitting. It only happens with images that appear way too many times in too many forms in the training set. Usually with the most iconic images of all time, such as the Mona Lisa. And, naturally, hyper-iconic images are the first thing that come to mind for humans when they test for the issue because those images are seared into our brains too.
And, much like with our brains, when it happens it doesn’t actually exactly reproduce parts of the source image. But, you have actively pay attention to notice what happened. It makes an image that is overly similar conceptually. To our brains that feels the same. So, that’s enough to convince someone at a glance that it is the same.
But, if you look at an overfit result of “The Beatles Abbey Road album cover”, you’ll see things like: Band members are crossing the road, but they are all variations of Ringo. Vehicles from that era are in the background, but they are in a different arrangement and none of them are directly from the source. The Band members are wearing suits, but they are the wrong style and color. There are the wrong number of stripes on the road. It’s not the same as a highly skilled human drawing an iconic image from memory. But, it sure is darn similar.
And, besides all that, everyone working in the tech considers the overfitting of iconic images to be a failure case that is being actively addressed. It won’t be long before it stops happening entirely.
In the meantime, I’d challenge anyone to try to make an overfit result that significantly reproduces a specific work of every promoter’s favorite, Greg Rutkowski, using Dall-e, Midjourney or the Stable Diffusion models released directly by Stability AI. Greg’s pixels aren’t in the model file to be copied. Only a conceptual impression of his style.
Not really, though that is another legitimate issue.
I was talking about 1) the fundamental training and inference process, which remembers pixels, not concepts or techniques. Today’s AI learns to create imagery in a fundamentally different way than people do. And 2) image generation AI based on text prompts like Stable Diffusion can easily be asked to reproduce training data by having a prompt that is narrow and specific enough. This is not over fitting, it’s a function of the fact that some inputs are quite unique, and you can use the prompt to focus on that uniqueness.
The training process looks at pixels. Gets an impression of the relationships between words and curves in images. But, to say it “remembers pixels” is pretty loaded language that implies copying pixels into the model file.
I’d like to see examples of using SD to copy some specific piece of art that hasn’t been plastered millions of times across the internet. Sure, you can get a decent Mona Lisa knock off. Maybe even a strong impression of the Bloodbourne game cover art marketing material. But, reproducing a specific painting from Rutkowski would be quite a surprise to me.
Yes the training process looks at pixels, because that’s all it has. That’s the point. Humans don’t look at pixels, they learn ideas. It’s not in the least bit surprising that AI models shown a bunch of examples sometimes replicate their example inputs, examples are all they have, and they are built specifically to reproduce images similar to what they see, I’m not sure why you consider that idea “loaded”.
Again, naming Rutkowsi invokes an impression of his style. But, copies none of his paintings.
Read the paper. What I found is that a random sampling of the database naturally found a small subset of images that are highly duplicated in the database. Researchers we able to derive methods to produce results that give strong impressions of images such as: a map of the United States, Van Gogh's Starry Night, and the cover of Bloodborne :P with some models and not at all with others. The researchers caution against extrapolating from their results.
> We speculate that replication behavior in Stable Diffusion arises from a complex interaction of factors, which include that it is text (rather than class) conditioned, it has a highly skewed distribution of image repetitions in the training set, and the number of gradient updates during training is large enough to overfit on a subset of the data.
And, much like with our brains, when it happens it doesn’t actually exactly reproduce parts of the source image. But, you have actively pay attention to notice what happened. It makes an image that is overly similar conceptually. To our brains that feels the same. So, that’s enough to convince someone at a glance that it is the same.
But, if you look at an overfit result of “The Beatles Abbey Road album cover”, you’ll see things like: Band members are crossing the road, but they are all variations of Ringo. Vehicles from that era are in the background, but they are in a different arrangement and none of them are directly from the source. The Band members are wearing suits, but they are the wrong style and color. There are the wrong number of stripes on the road. It’s not the same as a highly skilled human drawing an iconic image from memory. But, it sure is darn similar.
And, besides all that, everyone working in the tech considers the overfitting of iconic images to be a failure case that is being actively addressed. It won’t be long before it stops happening entirely.
In the meantime, I’d challenge anyone to try to make an overfit result that significantly reproduces a specific work of every promoter’s favorite, Greg Rutkowski, using Dall-e, Midjourney or the Stable Diffusion models released directly by Stability AI. Greg’s pixels aren’t in the model file to be copied. Only a conceptual impression of his style.