thank you - we're a team of 2 and former full-time frontend engineers so that means a lot to us! You're spot on with the guard rails: we do a lot of post-processing, i.e., after LLM spits something out, we parse the AST, strip out hallucinated imports, add imports that are missing. And yes! We also do a bit of pre-processing (expanding the prompt, feeding relevant examples via a RAG-based approach).
It feels like everyone is building in the AI space these days, but I gotta say: it's quite fun tweaking it. The non-deterministic nature is simultaneously the worst and best thing.
> The non-deterministic nature is simultaneously the worst and best thing.
This is what always has me asking, "How can you trust it?" with usage of LLMs for some pretty complex tasks, so I gotta know, what kinds of tricks have you employed to identify hallucinations from good output? How do you separate valid output (valid meaning it works but isn't necessarily what's desired) from desired output?
Additionally, how do you identify the most performant ways of doing things? Have you found that you've recreated portions of websites more efficiently than the real sites you're mimicking?
The nice thing about building a design tool is correct output is defined as “did it render.” We do things like strip out hallucinated imports and parse the ast for errors to help with that.
“Did it output what I desired?” - we have the classic AI app stuff like thumbs up/down, and many features we’ve built are an attempt to make that answer be a solid yes. For example, we have a “referencing” feature (if you are within a “project” - our version of an infinite canvas) that lets you reference existing designs when doing a prompt for a brand new design. This helps the LLM keep the output consistent.
Regarding performance: you can download the raw HTML so I suppose that is more performant than a version that loads all the JS, hah. But our product focus is more on the design rather than the generated code, as we’ve seen with our customers that engineers will almost always touch it up.
To answer your last question: when users are “finished” and it’s deployed (and they share it with me), people don’t really mimic portions of websites. By they time they're done, it’s fully their own, they’ve iterated on it so much. What the imported from the extension ended up being just a base to get the LLM in the right initial context.
It feels like everyone is building in the AI space these days, but I gotta say: it's quite fun tweaking it. The non-deterministic nature is simultaneously the worst and best thing.