There is a more recent approach to auto-regressive image generation.
Rather than predicting the next patch at the target resolution one by one, it predicts the next resolution. That is, the image at a small resolution followed by the image at a higher resolution and so on.
https://arxiv.org/abs/2404.02905