Many claims don't stand up to scrutiny, and some look suspiciously like training...

uh_uh · 2025-02-09T13:06:18 1739106378

> There is no abstracted concept of a "colour" in there. There's just a lot of imagery tagged with each colour name, and if you select a different colour you get a vector in a space pointing to different images.

It has been observed in LLMs that the distance between embeddings for colors follows the same similarity patterns that humans experience - colors that appear similar to humans, like red and orange, are closer together in the embedding space than colors that appear very different, like red and blue.

While some argue these models 'just extract statistics,' if the end result matches how we use concepts, what's the difference?

aerhardt · 2025-02-10T20:47:27 1739220447

The difference is if I ask a 5 year old to re-draw the drawing in orange, they will understand exactly what I mean.

rcxdude · 2025-02-09T14:27:40 1739111260

Part of this is that the art generators tend to use CLIP, whjch is not a particularly good text model, often only being slightly better than a bag of words, which makes many interactions and relationships pretty difficult to represent. Some of the newer ones have better frontends which improve this situation, though.

I think color is fairly well abstracted, but most image generators are not good for edits, because the generator more or less starts from scratch, and from a new random seed each time (and even if the seed is fixed, the initial stages of the generation, where things like the rough image composition form, tend to be quite chaotic and so sensitive to small changes in prompt). There are tools that can make far more controlled adjustments of an image, but they tend to be a bit less user-friendly.

mr_toad · 2025-02-10T00:27:09 1739147229

> I think color is fairly well abstracted, but most image generators are not good for edits, because the generator more or less starts from scratch

It’s unlikely that the models have been trained on “similarity”. Ask it to swap red boots for brown boots and it will happily generate an entirely different image because it was never trained on the concept of images being similar.

That doesn’t mean it’s impossible to train an LLM on the concept of similarity.

SpicyLemonZest · 2025-02-10T00:41:28 1739148088

I just asked Midjourney to do precisely that, and it swapped the boots with no issue, although it didn't seem to quite understand what it meant for a cat to _wear_ boots.

tmnvdb · 2025-02-09T16:39:56 1739119196

Regarding the apple paper: https://andrewmayne.com/2024/10/18/can-you-dramatically-impr...