A pixel (in the context of an image) could be “wrong” in the sense that its assigned value could lead to an image that just looks like a bunch of noise. For instance, suppose we set every pixel in an image to some random value. The resulting would look like total noise and we humans wouldn’t recognized it as a sensible image. By providing a corpus of accepted images, we can train a model to generate images (arrays of pixels) which look like images and not, say, random noise. Now it could still generate an image of some place or person that doesn’t actually exist, so in that sense the pixels are collectively lying to you.