How would x86-64 be as efficient with the same transistor & power budget when they have to run an extra decoder and ring within that budget? Seems physically impossible.
All else being equal, they can't. But the difference isn't as big as some people like to think. For a current high end core, probably low single digit %. And x86-64 has had a lot more effort going into software optimization.
As I understand it, the actual processing part of most chips nowadays is fairly bespoke, with a decoder sitting on top. I doubt decode can make up that large a portion of a chips power consumption (probably negligible next to the rest of the chip?), so other improvements can make up for the difference.
The latest ARM Cortex CPUs (models X2, A715 and A510) drop 32-bit support. Qualcomm actually includes two older Cortex-A710 cores in the Snapdragon 8 gen 2 for 32-bit support. Don't know much about Apple Silicon but didn't they drop 32-bit a couple of years back?
Google has purged 32-bit apps from the official Android app store, but as I understand it the Chinese OEMs that ship un-Googled AOSP ROMs with their own app stores haven't been as aggressive about moving to 64-bit.
Because the more complex decoder is traded in this case for a denser instruction set, which means they can trade it for less instruction cache (which is more power hungry).