In image gen: comfyUI gives a node-based workflow that gives a lot of room for 'creative' control, of mixing, and mathematically combining masks, filters, and prompts (and starting images / noise {at any node in that process}).
I would expect the same interface for audio to emerge for 'power users'.
Disagree that curation and prompts adds artistry (dense intent reflected in the output) to AI generations.
"Curation" in AI can only surface the curator's local maxima among a tiny and arbitrary grab-bag of seed integers they checked among the space of 2^64 options; it's statistically skewed 99% towards the model's whims rather than anyone's unique intent or taste.
Prompt crafting is likewise terribly low fidelity since it's a constant battle with the model's idiosyncratic interpretation of the text, plus arbitrary perturbations that aren't actually correlated with the writer's supposed intent. And lord spare me the "high quality high resolution ultra detailed photorealistic trending on artstation" type prompts that amount to a zero-intent plea for "more gooder". And when pursuing artistry, using artist names / LORAs are a meta-abandonment of personal direction, abdicating artistic control and responsibility to a model's idea of another artist's idea of what should be done.
Fancier workflows generally only multiply this prompt-and-curate process across regions/iterations, so can't add much because they're multiplying a tiny fraction by a fixed factor.
I agree with you on the idea of prompts and seeds leaving much to be desired. So that's why I think more sophisticated steering is necessary.
The models' latent space is extremely powerful, but you get hamstrung into the text encoders whims when you do things through a prompt interface. In particular, you've hit exactly an issue I have with current LLMs in general in that they are locked into wors and concepts that others have defined (labelings of points in the latent space).
Wishy washy thinking: I'd be nice if there were some sort of Turing complete lambda calculus sort of way to prompt these models instead. Where you can define new terms, create expressions, and loops and recursion or something.
It would sort of be like how SVGs are "intent complete" and undeniably art, but instead of vector graphics, it is an SVG like model prompt.
Curation, refinement and taste is practicially worthless on its own. The technical difficulty of art is the investment that makes it worth considering.
So if you are right, then art will pretty much be worthless in the future. You can just iterate over the search space defined by "good taste" and produce an infinite amount of good art for no work.
Curation is not worthless. It's the exact opposite, in the abundance of stuff, it's extremely valuable.
Search is not free, and it can never be free. What happens when search gets easier and easier is that your demands for quality and curation will get higher until all time saved in search efficiency is spent on search breadth.
How dare you! Next you will say my current pornhub prompts refined over years aren't art either! Or that prompt results are purely for personal gratification and no one else cares about them.
This is a stupid argument, because even with the most automatic camera you have to point it at something and make a decision about what to frame in. AI music is more like buying a bunch of old unlabelled records from a bargain bin and then praising yourself whenever one of them turns out to be worth listening to.
Of course it's a stupid argument, but it's exactly what the ancestors of today's AI naysayers said when photography became practical.
Then again, it's possible for an art form to exhaust its own possibilities. To the extent that "prompt engineering" is sufficient to generate any music or artwork we have in mind, that seems like an indication that we've reached that point. To the extent that it's not sufficient, that seems like an indication that there's still interesting stuff left to do.
Either way, if you are hoping that things will stay the same, then I'm afraid that neither art nor technology are good career choices.
This is dead wrong. People were open to using it as a tool then as they are now, but not all offerings are of equal value. I know a lot of musicians who would be into an AI 'session buddy' who could play along with them or serve as a tutor for advanced concepts. The existing offerings in the music space are at the level of Deepmind when it made everything look like a dog on acid.
As I've written before, proponents of AI music as an infinity jukebox completely miss the point of how music works. In a social context people want a jukebox to provide favorite (or at least famous) sounds that everyone can vibe along to. People listening on their own either replay their nostalgic favorites or iterate on them if they have a strong genre preference. AI could in theory replace a DJ at a nightclub where not everyone needs to know every song if the vibe is good and the beats are tight, but they will still want someone to focus on. Beyond an intimate small group of people, parties where loud music is blasting off a playlist (or DAT or CD changer) with nobody actively DJing don't work because people quickly feel that if the music is changing independently of the dancefloor then the collective connection is broken.