I think it can be optimized quite a lot by not using a stock PNG decoder library, because all images are quite simple and can be generated from non-pixelated smaller sprites (many images are pre-scaled by 2x, which can be done during the postprocessing) or from a simple algorithmic code.