Plenty of file formats support this, but they're generally for DAWs and not for ...

colmmacc · on Dec 6, 2022

AAC is already interleaved multi-channel. Two-channel L/R is obviously very common for music, but spatial audio and Dolby Atmos comes with more channels (up to 128, which can be arranged and binned into "beds"). 5.1 (6 channels) and 7.1 (8 channels) are also common for video. Having dedicated/isolated vocal tracks as a channel very trivial in the format. I'd expect this is how it's done, and not ML, because Apple have already been driving mixing and end to end workflow with studios for Spatial audio and mastered for iTunes.

nkozyra · on Dec 6, 2022

Channel and spatial/binaural tracks I'd expect, but separated tracks for instruments/vocals is a lot more time consuming and the kind of thing that studios/producers/masterers would bluster at enough that I wouldn't expect it to come close to giving Apple the volume they'd likely want.

It absolutely could be done, I just think that Apple would want very good coverage and studios would be very slow to provide this format.

colmmacc · on Dec 6, 2022

Apple need the labels either way. They can't just go creating new derived works without a license, and the artists and producers would far prefer to avoid the artifacts of something that is overly automated.

It's already become pretty common for studios and labels to make stems available (stems are full multi-track files that can be used with a DAW) to industry insiders and even the public sometimes. There's a community of remixers, samplers, even a small cottage industry of YouTubers who work with these regularly. It wouldn't take them more than a few minutes per track to annotate which channel is vocals.

nkozyra · on Dec 7, 2022

Stems are usually lossless files and have an intended purpose for remixing.

Do you stop at instruments and vocals or are we talking 100+ tracks? It strikes me as nontrivial work for very limited purpose. Apple can turn your vocals down so now your exports have to include extra tracks? I don't think that's a good enough sell. The labels do care about this use case.

munificent · on Dec 6, 2022

It makes a lot of sense for an audio file format to support multiple tracks when each track is intended to target a different output device (left and right speakers, etc.)

It makes less sense when the intent is to mix those tracks together and send them to a single output device. Producers and audio engineers want full control over that mixing process because it's almost never just a simple sum. They are doing audio compression (sometimes multi-band), dynamic EQ, saturation, limiting, etc. They wouldn't want to give that up, because it's an essential part of making a good sounding recording.