> JVM virtual threads can be efficient because the runtime has complete knowledge of the executing code and stack layouts, how the heap is laid out, how the GC works and it can control how code is compiled.
Forgive me for staying doubtful, but I recall hearing this same "the JVM can be very fast and efficient because its JIT has complete knowledge and control" spiel back in the 90s, and back then, anyone could clearly see that the JVM was not as fast compared to pre-compiled native code as it was being promised.
> The kernel can't do any of these things - it has to assume a process is a black box that could do anything with its stacks, could be compiled by anything and so on.
The kernel has to assume nothing; it can dictate how userspace processes behave. As an example, a process which plays too many games with its stacks, without kernel cooperation, will quickly find out that signals share the same stack unless the kernel is told to use an alternate stack. A process which uses a register declared in the platform ABI as being for kernel use will find out that it can be unpredictably overwritten on a context switch. There are things like shadow stacks and segment register bases which can only be manipulated when the kernel allows it. And so on.
Of course, for compatibility reasons, the current ABI allows userspace processes to do a lot of unpredictable things, but nothing prevents a new "highly scalable threads" process ABI, with stricter rules, from being developed if necessary. Or it could be that only a few cooperative additions to the userspace to kernel ABI are necessary; we already have things like the many options to the clone() system calls, the futex system call, restartable sequences, etc.
> the JVM was not as fast compared to pre-compiled native code as it was being promised
Well head-for-head Java will still lose to C++ in many benchmarks, but that's not really due to compiled code quality, it's more about language semantics. Java is very fast for the sort of language it currently is. The big wins for C++ are that Java doesn't have value types or support for vector operations. Both are under development, actually vector ops is basically done but it's waiting for support for value types (see discussion elsewhere).
Also GCd languages trend towards a functional style without much in-place mutation, whereas C++ trends in the opposite direction, so C++ will sometimes use the CPU cache more effectively just due to prevailing habits amongst programmers.
> The kernel has to assume nothing; it can dictate how userspace processes behave.
Yes in theory you could fuse the language VM with the kernel and research operating systems like MSR Singularity did that. But a normal kernel like NT, Linux or Darwin can't do this and not only for backwards compatibility. The JVM will do things like move a virtual thread stack back and forth from the garbage collected heap and do so on the fly. Unless the kernel contains a JIT compiler, GC and injects lots of runtime code into the app's process it's going to find it tricky to do the same. By the time you've done the same you haven't implemented better kernel threads, you've made the JVM run in the kernel.
It's been a while since I read up on this, but my understanding is that with OS threads, during a context switch it has to pop the entire process stack, which in Java is 1MB by default. This is expensive. Virtual threads "context switches" have much more lightweight stacks because the JVM knows exactly what kind of state needs to be associated with the virtual thread and thats where the difference lies.
Forgive me for staying doubtful, but I recall hearing this same "the JVM can be very fast and efficient because its JIT has complete knowledge and control" spiel back in the 90s, and back then, anyone could clearly see that the JVM was not as fast compared to pre-compiled native code as it was being promised.
> The kernel can't do any of these things - it has to assume a process is a black box that could do anything with its stacks, could be compiled by anything and so on.
The kernel has to assume nothing; it can dictate how userspace processes behave. As an example, a process which plays too many games with its stacks, without kernel cooperation, will quickly find out that signals share the same stack unless the kernel is told to use an alternate stack. A process which uses a register declared in the platform ABI as being for kernel use will find out that it can be unpredictably overwritten on a context switch. There are things like shadow stacks and segment register bases which can only be manipulated when the kernel allows it. And so on.
Of course, for compatibility reasons, the current ABI allows userspace processes to do a lot of unpredictable things, but nothing prevents a new "highly scalable threads" process ABI, with stricter rules, from being developed if necessary. Or it could be that only a few cooperative additions to the userspace to kernel ABI are necessary; we already have things like the many options to the clone() system calls, the futex system call, restartable sequences, etc.