It's hell useful, I use Cursor several times a week (and I'm not even working as a dev full time rn), and ChatGPT is my daily driver.
Yet, it's weird to me that we're 3 years into this "revolution" and I can't get a decent slideshow from an LLM without having to practically build a framework for doing so.
It is a focus, data, and benchmarking problem. If someone comes up with good benchmarks, which means having a good dataset, and gets some publicility around, they can attract the frontier labs attention to focus training and optimization effort on making the models better for that benchmark. This is how most the capabilities we have today have become useful. Maybe there is some emergent initial detection of utility, but the refinement comes from labs beating others on the benchmarks. So we need a slideshow benchmark and I think we'd see rapid improvement. LLMs are actually ok at a building html decks, not great, but ok. Enough so that if we there was some good objective criteria to tune things toward I think the last-mile kinks would get worked out (formats, object/text overlaps). the raw content is mainly a function of the core intelligence of model, so that wouldn't be impacted (if you get get it to build a good bullet-point markdown of you presentation today it would be just a good as a prezo, but maybe not as visually compelling as you like. Also this might need to be an agentic benchmark to allow for both text and image creation and other considerations like data sourcing. Which is why everyone doing this ends up building their own mini framework.
A ton of the reinforcement type training work really just aligning the vague commands a user would give to the same capability a model would produce with a much more flushed out prompt.
Yet, it's weird to me that we're 3 years into this "revolution" and I can't get a decent slideshow from an LLM without having to practically build a framework for doing so.