I'll answer the polling question. The Tokio runtime (and async rust in general) works a bit differently than other async runtimes like node. With node, callbacks are provided and executed when an OS event is received. With Tokio, there are no callbacks. Instead, async logic is organized in terms of tasks (kind of like async green threads). When the task is blocked on external events, it goes into a waiting state. When external OS events are received, the task is scheduled by the runtime and eventually the runtime "polls" it. Because the poll happens in response to an OS event, most times, the poll results in the task making progress. Sometimes there are false positive polls.
This page goes into a bit more depth and shows an example of how one would implement a (very simple) runtime/executor: https://tokio.rs/tokio/tutorial/async
I read your link earlier today, and have been thinking about it; I particularly like the pedagogy of rebuilding it "in the small" with your MiniTokio example.
I don't know Rust. If I had to guess, this means that you've reused Rust's threads for tasks, such that they may not be done computing when a resource is available? In any event, I want to circle back to the OP, and note that runtime visualizations like this are awesome, and conversations like this is why. I personally don't think anyone spends enough time dwelling in their runtime(s), and certainly no async runtime has good visualizations, so its pretty cool to me that tokio-console is taking the lead here. I've been bullish on Rust for 5 years; maybe it's time to try it out for reals.
The purpose of async is mostly to avoid OS threads, and rust decided not to go down the route of implementing user space threads.
Instead, for async, rust implemented the ability to basically encode a functions stack frame and instruction pointer into a "normal" (but opaque) struct. What an async runtime like tokio does is (through a few levels of useful indirection that I won't talk about) store a list of these structs, and decide when it's a good idea to "call" one of them. When called, the structs either return a final value, or return a value saying "call me again later", in which case the runtime presumably puts it back into it's list of structs and calls it again sometime later.
Figuring out when to call it is left up to the runtime, but the useful ones will do things like record what operation it's waiting for and call it when that operation is ready.
> rust decided not to go down the route of implementing user space threads
Rust had (optional) user-space threads a long time ago, but that was removed in the pre-1.0 days as it added a lot of complexity and had some unavoidable performance loss even when opting for native threads (it forced dynamic dispatch on anything related to threading or I/O). There was a lot of discussion here but eventually it was declared that the OS thread scheduler was in fact perfectly capable of handling large numbers of threads and that virtual memory mapping meant the stack space allocation for each thread wasn’t a big deal and so green threads were removed.
I sometimes wonder what is the fundamental distinction between a callback-based API and this wakeup-based task API. I guess the main difference is that in a callback you generally provide the result as an argument whereas with a wakeup-based API you just wake the task and it has to look for the result in some stored state somewhere.
But ultimately both of them take the "rest of the computation"/continuation and store it somewhere (i.e. on some sleeping task/callback list) to be awakened/invoked later.
It is fairly subtle and mostly an implementation detail. In Rust, the concept of "polling tasks" was very exposed before the async/await keywords were introduced, so the lingo kind of stuck. There is an argument that we should move away from that lingo now that it is mostly hidden as an implementation detail, but we haven't yet.
@tijsvd mentioned that the callback model usually requires more allocation, which Rust is eager to avoid. I'll add that the wakeup/polling model plays much more nicely with Rust's ownership and borrowing rules. Callbacks usually need to hold pointers to the objects that they capture. In a GC'd language, this usually isn't a big deal, other than sometimes causing some surprising leaks. But in Rust, where the compiler wants to keep track of how long pointers live and which objects are aliased, it gets real ugly real fast. The wakeup/polling model sidesteps this nicely, because the no one besides the task itself holds any pointers to the objects that a task owns.
A waker-based API can fall back to waking all paused tasks in a background process to recover from lost events (epoll overflow or such), while a callback-based API can't "just" do so without (allocation?) cost on the happy path.
Their inherent resilience to spurious wakeups is quite useful in that regard.
They also work with exotic FD's, as long as those can still be registered with epoll. For example, pidfd can be polled for readability (despite any read(2) call failing with EINVAL), triggering when the corresponding process terminated.
I guess the benefit is that at least on Linux pre-io_uring, the async syscall way of doing things was via poll/select/epoll to notice when an fd unblocked, followed by waking whatever corouting/statemachine was interested in that event. It composes quite well.
The callback is really hard to implement without allocating memory for each wakeup. The poll mechanism can simply leave the task in place. I suspect the poll thing is also easier to generate.
This page goes into a bit more depth and shows an example of how one would implement a (very simple) runtime/executor: https://tokio.rs/tokio/tutorial/async