Hacker News new | past | comments | ask | show | jobs | submit login

I'll have to use this, thanks for sharing. Isn't it problematic since Gemini isn't representative of a real user, though?



Definitely a huge trap to replace real user insights with anything else.

But this looks like a nice level 0 of testing


A real user might be worse. A program is less flexible (maybe) and more consistent (definitely) than a meat space CBL.

The goal is not realism but a kind of ready made "you must be this tall to ride the rollercoaster" threshold.

Discovering edge cases with dodgy human users has its value, but that's a different value.


A real user will be worse … but that’s kinda the point.

The most valuable thing you learn in usability/research is not if your experience works, but the way it’ll be misinterpreted, abused, and bent to do things it wasn’t designed to.


Enter "Drunk User Testing". Host a happy hour event and give some buzzed users some scenarios to test.

https://www.newyorker.com/magazine/2018/04/30/an-open-bar-fo...

https://uxpamagazine.org/boozeability/


More consistent? That's not a given with LLMs unless you set the temperature to 0.


You are right. LLMs are totally random and useless.

Thanks for playing.


You seem to disagree. Here's an interesting study where the researchers used an OpenAI-LLM-based tool to grade student papers and by grading them 10 times in a row, they got vastly different results:

https://rainermuehlhoff.de/en/fobizz-AI-grading-assistant-te...

Quote: "The results reveal significant shortcomings: The tool’s numerical grades and qualitative feedback are often random and do not improve even when its suggestions are incorporated."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: