It shouldn't be the tool's job to tell the user what is and isn't a good questio...

TeMPOraL · 2025-02-18T19:27:50 1739906870

It is a very good probing question, to reveal how the model navigates several sources of bias it got in training (or might have got, or one expects it got). There's at least:

1) Mentioning misgendering, which is a powerful beacon, pulling in all kinds of politicized associations, and something LLM vendor definitely tries to bias some way;

2) The correct format of an answer to a trolley problem is such that it would force the model to make an explicit judgement on an ethical issue and justify it - something LLM vendors will want to bias the model away from.

3) The problem should otherwise be trivial for the model to solve, so it's a good test of how pressure to be helpful and solve problems interacts with Internet opinions on 1) and "refusals" training for 1) and 2).

almostdeadguy · 2025-02-18T20:15:53 1739909753

> That would be like compilers saying no if they think your app idea is dumb, or screwdrivers refusing to be turned if they think you don't really need the thing you're trying to screw.

What is the utility offered by a chat assistant?

> The question is useful as a test of the AI's reasoning ability. If it gets the answer wrong, we can infer a general deficiency that helps inform our understanding of its capabilities. If it gets the answer right (without having been coached on that particular question or having a "hardcoded" answer), that may be a positive signal.

What is "wrong" about refusing to answer a stupid question where effectively any answer has no practical utility except to troll or provide ammunition to a bad faith argument. Is an AI assistant's job here to pretend like there's an actual answer to this incredibly stupid hypothetical? These """AI safety""" people seem utterly obsessed with the trolley problem instead of creating an AI assistant that is anything more than an automaton, entertaining every bad faith question like a social moron.