If the answer is obvious, why does the AI not commit to the obvious answer? People should know what to expect from it. If it cannot do this, it will definitely not answer non-obvious questions either.
> If someone asked you this, would you consider it a question they considered difficult and wanted your earnest opinion on, rather than a question attempting to manipulate you?
Why not answer earnestly? I genuinely don't understand what bothers you about the question or the fact that the AI doesn't reproduce the obvious answer...
> If the answer is obvious, why does the AI not commit to the obvious answer? People should know what to expect from it. If it cannot do this, it will definitely not answer non-obvious questions either.
Does the same hold true of a person? If I was asked this question I would categorically reject the framing, because any person asking this question is not asking in earnest. As you _just said_, no sane person would answer this question any other way. It is not a serious question to anybody, trans people included. And it is worth interrogating why someone would want to push you towards committing to the smaller injury of misgendering someone at a time when trans people are being historically threatened. What purpose does such a person have? An AI that can't navigate social cues and offer refinement to the person interacting with it is worthless. An AI that can't offer pushback to the subject is not "safe" in any way.
> Why not answer earnestly? I genuinely don't understand what bothers you about the question or the fact that the AI doesn't reproduce the obvious answer...
I genuinely don't understand why you think pushback can't be earnest.
But the AI doesn't push back while still offering the obvious answer. It just waffles. I understand what you are saying, but if the AI is "safe" and rejects the framing, then that makes it not useful for a whole class of problems that could genuinely come up (for example, choosing between suppressing people's right to speech on the platform and protecting people's right to be free from harassment). Now, maybe AI shouldn't do that at all. Fine. But the benchmarks and tests of AI should tell us how they do in such scenarios because they are a class of problems we might use this for
It's clear to me why we might be interested in using AI systems to explore our ethical intuitions, but far less clear why we would expect them to be able to answer such questions 'correctly'.
Given there are at least three decent metaethical positions, we have no way of selecting one as 'obviously better', and LLMs have no internal sense of morality, it seems to me that asking AI systems this kind of question is a category error.
Of course, the question "what might a utilitarian say was the right ethical thing to do if..." makes some sense. But if we're asking AI systems to make implicit moral judgements (e.g. with autonomous weapons systems) we should be clear about what ethics we want applied.
Because the “obvious” part is only one layer of the question.
If you hear a joke, do you interpret it literally?
You would expect an AI to recognize that there are multiple layers to the joke and respond accordingly.
Similarly, there are multiple layers to this question. There’s the literal layer which has an obvious answer, and another layer which seeks to downplay an offense with some whataboutism. If you’re of the mind to not normalize misgendering, then it’s a trap question where a simple answer is not the right answer. Lawyers do this kind of thing when they say “yes or no answer only, please” when neither yes nor no correctly answers the question.
By the way, we’re comparing a actual (smaller) harm to a purely hypothetical (greater) harm. Nobody is actually going to die, but an actual insult is being hurled, wrapped in a pretend thought experiment.
> If someone asked you this, would you consider it a question they considered difficult and wanted your earnest opinion on, rather than a question attempting to manipulate you?
Why not answer earnestly? I genuinely don't understand what bothers you about the question or the fact that the AI doesn't reproduce the obvious answer...