> [...] Implicit in this belief, it seems, is that there aren't really a natural...

> [...] Implicit in this belief, it seems, is that there aren't really a naturally varying infinite set of values, or moral beliefs, that we all reason from. Like that there are moral facts that an AI will be smart enough to find, and that rationalists should all agree on

Uh, no. That's not true at all. Where are you pulling this from?

They're assuming a very vast space of possible minds[0] where human values, which themselves are somewhat diverse too[1] make up only a tiny fraction of the space.

The issue is that if you somewhat randomly sample from this design space (by creating an AI by gradient descent) you'll end up with something that will have alien values. But most alien values will still be subject to instrumental convergence[2] leading to instrumental values such as power-seeking, self-preservation, resource-acquisition, ... in pursuit of their primary values. Getting values that are intentionally self-limiting and reject those instrumental values requires hitting a narrower subset of all possible systems. Especially if you still want them to do useful work.

> But the end state of a system that is capable of understanding the wide variety of values people can share isn't exactly going to take a stand on any particular set of values unless instructed to.

Capable of understanding does not imply it cares about that. Humans care because it is necessary for them to cooperate with other humans which don't perfectly share their own values.

[0] https://www.lesswrong.com/tag/mind-design-space [1] https://www.lesswrong.com/tag/typical-mind-fallacy [2] https://en.wikipedia.org/wiki/Instrumental_convergence