That makes sense, but that would imply that there's a limit, right? Once the ima...

futureshock · on Aug 30, 2022

Yes you are on the right track. Once you get really close to a perfect score on your benchmark you can no longer improve so you need to develop a better benchmark with more headroom. And you have the right idea of how you go about benchmarking subjective quality. A bunch of humans produce output-scoring pairings and the model is judged against that. To train an AI you need a very measurable goal and in this case the measure is “humans like it.”

If you are noticing that this seems to fundamentally limit model performance on certain tasks to aggregate human capability, you are noticing correctly.

To give you some idea of what these benchmarks look like, here’s the prompt list from DrawBench which Google created as part of training their Imagen model.

https://docs.google.com/spreadsheets/u/0/d/1y7nAbmR4FREi6npB...

XCSme · on Aug 30, 2022

Also, after a point the differences will be more given by the specific individual that views the image, not by what the AI can generate, so the AI would have to optimize it's output per individual and would need to have a deep understanding of them.