The thing people love about Gen AI is not needing to understand the dozens, if n...

The thing people love about Gen AI is not needing to understand the dozens, if not hundreds, of deliberate and unconscious artistic decisions an artist makes when creating a piece by hand. It's great to be able to think of a core idea, refine the overall aesthetic, and then work out some details. It's freeing, fun, and nearly useless for making high-end media.

Thousands of deliberate artistic decisions go into making a TV show, let alone a Hollywood movie. Think about everything from the subtlest cuts in the tailored costumes for every character, what each part of each hair style will look like in different scenarios, how all of that stuff is lit in the subtlest ways and what shade of almost black you want for the matte and whether the rim light needs to be a different color to make it work... all for each shot. That's the precision required to make even generic high-end media, and the need to manipulate those things with perfect accuracy doesn't go away when you're using Gen AI to make it. People will probably be more critical of Gen AI output than traditional media.

I know of a big, moneyed studio that tried to replace their concept artists with a group of prompt engineers and promptly fired them two months later after the art directors just couldn't take it anymore. They wanted someone that could precisely make exactly what they wanted after 3 hours in two attempts rather than someone who could make 100 polished versions they had to review in 5 minutes, but took 6 hours to get one version that really met their needs because each revision was imprecise and yielded other undesirable results even with control nets, inpainting, loras, and all that. Beyond that, since it was in a flat raster format, it was literally useless for anything else. It's not like Gen AI has no role in that workflow-- a traditional artist might use something like that for ideation and reference-- but modifying the flat raster output of Stable Diffusion, et al would be even more difficult than roughing it out from scratch in many cases and yield an inferior product.

When it came down to it, having people that knew how to precisely execute an artistic vision tuned to produce the output studios know will make them money. And that's concept art, not the stuff that gets put on screen, which has to be a whole lot more precise and for a 90 minute movie, you need 129,600 perfect polished still images, and those will come from a pool of at least that many more which editors can compose into a piece. Not having it in LOG, have separated AOVs for precise grading, color correction and compositing, etc are all huge impediments.

It's no different than giving a completed mp3 of a song to a talented music producer and expecting them to turn that audio into a hit song (not cut it up into samples using traditional techniques to make something new, not use it as inspiration to re-make the song, to take the audio in that mp3 and use that audio) in a fraction of the time it would take them to do it from scratch. They'd just laugh at the suggestion.