On the topic of improving OS threads, there are scheduler activations and similar mechanisms in some microkernels. Basically sending events for all hardware things and context switches, making user level threads as flexible as kernel threads. Unfortunately they haven't become mainstream.