I think this is the easiest kind of scenario to refute.
The interface between a superintelligent AI and the physical world is a) optional, and b) tenuous. If people agree that creating weird concrete structures is not beneficial, the AI will be starved of the resources necessary to do so, even if it cannot be diverted.
The challenge comes when these weird concrete structures are useful to a narrow group of people who have disproportionate influence over the resources available to AI.
It's not the AI we need to worry about. As always, it's the humans.
> here is an ungrounded, non-realistic, non-representative of a potential future intuition pump to just get the feel of things:
> (Yes, there are many holes in this, like how would it piggy back off of our infrastructure if it kills us, but this isn't really supposed to be coherent, it's just supposed to give you a sense of direction in your thinking. Generally though, since it is superintelligent, it can pull off very difficult strategies.)
If you read the above I think you'd realize I'd agree about how bad my example is.
The point was to understand how orthogonal goals between humans and a much more intelligent entity could result in human death. I'm happy you found a form of the example that both pumps your intuition and seems coherent.
If you want to debate somewhere where we might disagree though, do you think that as this hypothetical AI gets smarter, the interface between it and the physical world becomes more guaranteed (assuming the ASI wants to interface with the world) and less tenuous?
Like, yes it is a hard problem. Something slow and stupid would easily be thwarted by disconnecting wires and flipping off switches.
But something extremely smart, clever, and much faster than us should be able to employ one of the few strategies that can make it happen.
If the AI does something in the physical world which we do not like, we sever its connection. Unless some people with more power like it more than the rest of us do.
Regarding orthogonal goals: I don't think an AI has goals. Or motivations. Now obviously a lot of destruction can be a side effect, and that's an inherent risk. But it is, I think, a risk of human creation. The AI does not have a survival instinct.
Energy and resources are limiting factors. The first might be solvable! But currently it serves as a failsafe against prolonged activity with which we do not agree.
So I think we have some differences in definition. I am assuming we have an ASI, and then going on from there.
Minimally an ASI (Artificial Super Intelligence) would:
1. Be able to solve all cognitively demanding tasks humans can solve and tasks humans cannot solve (i.e. develop new science), hence "super" intelligent.
2. Be an actively evolving agent (not a large, static compositional function like today's frontier models)
For me intelligence is a problem solving quality of a living thing, hence point 2. I think it might be the case to become super-intelligent, you need to be an agent interfacing with the world, but feel free to disagree here.
Though, if you accept the above formulation of ASI, then by definition (point 2) it would have goals.
Then based on point 1, I think it might not be as simple as "If the AI does something in the physical world which we do not like, we sever its connection."
I think a super-intelligence would be able to perform actions that prevent us from doing that, given that it is clever enough.
I agree that the definitions are slippery and evolving.
But I cannot make the leap from "super intelligent" to "has access to all the levers of social and physical systems control" without the explicit, costly, and ongoing, effort of humans.
I also struggle with the conflation of "intelligent" and "has free will". Intelligent humans will argue that not even humans have free will. But assuming we do, when our free will contradicts the social structure, society reacts.
I see no reason to believe that the emergent properties of a highly complex system will include free will. Or curiosity, or a sense of humor. Or a soul. Or goals, or a concept of pleasure or pain, etc. And I think it's possible to be "intelligent" and even "sentient" (whatever that means) without those traits.
Honestly -- and I'm not making an accusation here(!) -- this fear of AI reminds me of the fear of replacement / status loss. We humans are at the top of the food chain on all scales we can measure, and we don't want to be replaced, or subjugated in the way that we presently subjugate other species.
This is a reasonable fear! Humans are often difficult to share a planet with. But I don't think it survives rational investigation.
If I'm wrong, I'll be very very wrong. I don't think it matters though, there is no getting off this train, and maybe there never was. There's a solid argument for being in the engine vs the caboose.
> I cannot make the leap from "super intelligent" to "has access to all the levers of social and physical systems control" without the explicit, costly, and ongoing, effort of humans.
Yeah this is a fair point! The super intellect may just convince humans, which seems feasible. Either way, the claim that there are 0 paths here for a super intelligence is pretty strong so I feel like we can agree on: It'd be tricky, but possible given sufficient cleverness.
> I see no reason to believe that the emergent properties of a highly complex system will include free will.
I really do think in the next couple years we will be explicitly implementing agentic architectures in our end-to-end training of frontier models. If that is the case, obviously the result would have something analogous to goals.
I don't really care about it's phenomenal quality or anything, it's not relevant to my original point.
> Either way, the claim that there are 0 paths here for a super intelligence is pretty strong so I feel like we can agree on: It'd be tricky, but possible given sufficient cleverness.
Agreed, although I'd modify it a bit:
A SI can trick lots of people (humans have succeeded, surely SI will be better), and the remaining untricked people, even if a healthy 50% of the population, will not be enough to maintain social stability.
The lack of social stability is enough to blow up society. I don't think SI survives either though.
If we argue that SI has a motive and a survival instinct, maybe this fact becomes self-moderating? Like the virus that cannot kill its host quickly?
Given your initial assumptions, that self-moderating end state makes sense.
I feel like we still have a disconnect on our definition of a super intelligence.
From my perspective this thing is insanely smart. We can hold ~4 things in our working memory (maybe Von Neumann could hold like 6-8); I'm thinking this thing can hold on the order of millions of things within its working memory for tasks requiring fluid intelligence.
With that sort of gap, I feel like at minimum the ASI would be able to trick the cleverest human to do anything, but more reasonably, humans might appear to be entirely close formed to it, where getting a human to do anything is more of a mechanistic thing rather than a social game.
Like the reason my early example was concrete pillars with weird wires is that with an intelligence gap so big the ASI will be doing things quickly that don't make sense, having a strong command over the world around it.
I think you are assuming it is goal seeking, goal seeking is mostly biological/conscious construct. A super intelligent species would likely want to preserve everything, because how are you super intelligent if you have destruction as your primary function instead of order.
I feel like if you are an intelligent entity propagating itself through spacetime you will have goals:
If you are intelligent, you will be aware of your surroundings moment by moment, so you are grounded by your sensory input. Otherwise there are a whole class of not very hard problems you can't solve.
If you are intelligent, you will be aware of the current state and will have desired future states, thus having goals. Otherwise, how are you intelligent?
To make this point, even you said "A super intelligent species would likely want to preserve everything", which is a goal. This isn't a gotcha, I just feel like goals are inherent to true intelligence.
This is a big reason why even the SOTA huge frontier models aren't comprehensively intelligent in my view: they are huge, static compositional functions. They don't self reflect, take action, or update their own state during inference*, though active inference is cool stuff people are working on right now to push SOTA.
*theres some arguments around what's happening metaphysically in-context but the function itself is unchanged between sessions.
> The interface between a superintelligent AI and the physical world is a) optional, and b) tenuous.
To begin with. Going forward, only if we make sure it remains so. Given the apparently overwhelming incentives to flood the online world with this sh...tuff already, what's to say there won't be forces -- people, corporations, nation-states -- working hard to make that interface as robust as possible?
The interface between a superintelligent AI and the physical world is a) optional, and b) tenuous. If people agree that creating weird concrete structures is not beneficial, the AI will be starved of the resources necessary to do so, even if it cannot be diverted.
The challenge comes when these weird concrete structures are useful to a narrow group of people who have disproportionate influence over the resources available to AI.
It's not the AI we need to worry about. As always, it's the humans.