I'm hoping to see some smaller MoE models released this year, trained with more recent recipes (higher quality data, much longer pretraining). Mixtral 8x7B was impressive when it came out, but the exact same architecture could be a lot more powerful today, and would run quite fast on Apple Silicon.