And the mic is not all that great either. I have a couple of them but they just weren't reliably picking up my voice and I couldn't hear the reply either (when it did hear me). I figured it would be easy to add a speaker to them but that sent me down a rabbit hole that I gave up on and put them in a drawer. I'll buy this for sure though because when the ESP32 box thing worked it worked really well and I loved being able to swap out parts of the assist pipeline.
To be fair, the issue with the Box-3 is HA's implementation; I used it with heywillow.io and it was incredible, I could speak to it from another room and it would pick up perfectly.
The audio out is terrible so I wrote a shim-server that captures the request to the TTS server for heywillow and sent it to a speaker I build myself running MPD on a Pi with a nice DAC and have it play the responses instead of the box-3's tiny speaker.
I don't expect the audio-out on this to be much better with its tiny speaker, but at least it has a 3.5mm jack.
I'm going to look into what that Grove port can do too and perhaps build a new speaker "module" that the Voice PE can sit on top of to make it a proper music device.