I'll have to use this, thanks for sharing. Isn't it problematic since Gemini isn...

willsmith72 · 2025-03-13T00:40:42 1741826442

Definitely a huge trap to replace real user insights with anything else.

But this looks like a nice level 0 of testing

CaffeineLD50 · 2025-03-13T01:31:16 1741829476

A real user might be worse. A program is less flexible (maybe) and more consistent (definitely) than a meat space CBL.

The goal is not realism but a kind of ready made "you must be this tall to ride the rollercoaster" threshold.

Discovering edge cases with dodgy human users has its value, but that's a different value.

gffrd · 2025-03-13T02:38:25 1741833505

A real user will be worse … but that’s kinda the point.

The most valuable thing you learn in usability/research is not if your experience works, but the way it’ll be misinterpreted, abused, and bent to do things it wasn’t designed to.

cpeterso · 2025-03-13T16:41:22 1741884082

Enter "Drunk User Testing". Host a happy hour event and give some buzzed users some scenarios to test.

https://www.newyorker.com/magazine/2018/04/30/an-open-bar-fo...

https://uxpamagazine.org/boozeability/

Tepix · 2025-03-13T03:58:40 1741838320

More consistent? That's not a given with LLMs unless you set the temperature to 0.

CaffeineLD50 · 2025-03-14T14:30:10 1741962610

You are right. LLMs are totally random and useless.

Thanks for playing.

Tepix · 2025-03-14T14:44:14 1741963454

You seem to disagree. Here's an interesting study where the researchers used an OpenAI-LLM-based tool to grade student papers and by grading them 10 times in a row, they got vastly different results:

https://rainermuehlhoff.de/en/fobizz-AI-grading-assistant-te...

Quote: "The results reveal significant shortcomings: The tool’s numerical grades and qualitative feedback are often random and do not improve even when its suggestions are incorporated."