But they are already multi-modal. The Google one can do live streaming video und...

TeMPOraL · 2025-02-16T10:25:41 1739701541

Fair, but OpenAI was doing that half year ago (though limited access; I myself got it maybe a month ago), and I haven't seen it yet translate into anything in practice, so I feel like it (and multimodality in general) must be a GPT-3 level ability at this point.

But I do expect the next qualitative change to come from this area. It feels exactly like what is needed, but it somehow isn't there just yet.