I haven't read much about how these systems work yet, so this is probably a novice question, but I'd be interested to hear more about how the algorithm handles text input and feeds it into the generator. Does the training process include a ton of tagged images or something, and then the model learns to be able to generate stuff that corresponds reasonably to those tags?
We are using already existing models based on open-source research. We have other models that OpenAI does not offer (e.g. super resolution) but they’re not in the app.
The reality though is that DALL-E 2 is still not open for access; open-source models are.
If you know how to get API access—besides waitlist of course—please let me know!