It's astonishing to see Apple settle for DALL-E 3 (or worse?! these remind me of the Bing samples) for the image generator part. Hasn't the incredible extent of mode-collapse and the horrible DALL-E 3 style become universally known and disliked yet?
No, it's worse than DALL-E 3, it's an on-device model that can only reproduce placid soulless images. The ones I put in the article have been heavily cherry-picked. The worst ones get far worse. DALL-E 3 can at least do text.
Oh, I assumed the image gen at least was running on the fancy remote GPU server.
But it being on-device doesn't prove it's not a DALL-E 3. Just like GPT-3 & GPT-4 & Llamas come in many sizes, some of which are (probably) feasible to run on-device...
Still, I suppose that probably does imply that it's some Apple half-hearted image-gen model (trained on a lot of DALL-E 3 contaminated data?), since I'm not sure OA would want to risk leaking even years from now a tiny, terrible, hyper-optimized-for-size DALL-E 3's weights.
Image Playground uses a tiny on device model. It certainly isn't DALL-E 3. In no universe are on device models intended to compete with massive cloud models.