Actually I think it is simpler than that, there was a time when the programmability of the phone was so much better than you could do in a cost effective embedded system that it was the right choice. Now you can put a 32 bit ARM Cortex M on a device for $3 and so spending on the plastic is more feasible.
You are probably right. I think some M3s have a DSP which could be leveraged for the voice portions. I'd imagine they also have a TI CC3000-like module with the whole TCP/IP stack to handle the comms. Maybe not necessarily the 3000, as they offer same functionality in larger sized modules for a lower cost. Further, I'd say TI also because of their SimpleLink system, which allows you to connect to Wi-Fi without entering SSIDs/passwords on a device. You'd just need to download the Amazon app to facilitate the initial connection if I understand it correctly.