I'd at least think that quadrature samples would be preferred as they offer instantaneous phase information. I dont think there is anything to be gained by forcing the model to derive this information from the time series data when the computation is so straightforward. Instead of a 48kHz stream of samples you feed it a 24KHz stream of I&Q samples; nothing to it.
I would draw an analogy here between NeRF and Gaussian Splatting -- like ok its great that we can get there with a NN but theres no reason to do that after you have figured out how to optimally compute the result you were after.
I also believe that granular synthesis is a deep well to draw from in this area of research.