One interesting thing is that io_uring can operate in different modes. One of the modes enables kernel-side polling, so that when you put data into the buffer, the kernel will pull the data out itself to do IO. That means from the application side, you can perform IO without any system calls.
Our general take was also that it has a lot of potential, but is relatively low level that most mainstream programmers aren't going to pay attention to it. Hence, it'll be a while before it permeates through various ecosystems.
For those of you that like to listen on the way to work, we cover io_uring on our podcast, The Technium.
> One interesting thing is that io_uring can operate in different modes. One of the modes enables kernel-side polling [...]
On a related note, I recently saw this presentation[1] where they show some benchmarks of the various modes.
One gotcha of sorts, though obvious when you think about it, is that the kernel-side polling mode requires a free CPU core for the polling thread. Meaning you'll get very poor performance if you're not leaving enough CPU for the kernel to do its polling.
> Our general take was also that it has a lot of potential, but is relatively low level that most mainstream programmers aren't going to pay attention to it. Hence, it'll be a while before it permeates through various ecosystems.
But POSIX async I/O (the aio_* functions) in Linux is basically worthless performance-wise AFAIU, because Glibc implements it in userspace by spawning threads to do standard sync I/O. Now Linux also has non-POSIX async I/O (the io_* functions), but it’s very situational because it works only if you bypass the cache (O_DIRECT) and can still randomly block on metadata operations (so can Win32, to be fair). There’s select/poll/epoll with O_NONBLOCK of course, which is what people normally use, but those do not really work with files on disk (neither do their WinSock equivalents). Hell, signal-driven IO (O_ASYNC) exists, I’ve used it to make a single-threaded emulator (CPU-bound unlike a network server) interact with the terminal. But asynchronous I/O of normal, cached files is only possible on Linux through the use of io_uring, as far as I’ve been able to figure out.
That said, I’ve read people here saying[1] that overlapped I/O on Windows also works by scheduling operations on a thread pool, even referencing KB articles[2]. This does not mesh with everything I’ve read about I/O in the NT kernel, which is supposed to be natively async to the point where the I/O request datastructure (the IRP) has what’s essentially an emulated call stack inside of it, in order to allow the I/O subsystem to juggle continuations. What am I missing? Does the Win32 subsystem need to dumb things down that much even inside its own implementation?
(Windows 8 also introduced a ringbuffer-based, no-syscalls thing called Registered I/O that looks very much like io_uring.)
It's a bit misleading. What they mean is that some operations can act as barriers for further operations. E.g. async calls to ReadFile won't run until the call to WriteFile finishes (if it's writing past the end of the file).
Per open(2) [1], you can’t really ask the kernel to not block on regular files:
> O_NONBLOCK [...] has no effect for regular files and will (briefly) block when device activity is required, regardless of whether O_NONBLOCK is set. []O_NONBLOCK semantics might eventually be implemented[.]
I’m actually not sure if the reported readiness for them is of any use, but the documentation for select(2) [2] doesn’t give me a lot of hope:
> A file descriptor is ready for writing if a write operation will not block. However, even if a file descriptor indicates as writable, a large write may still block.
This for data operations; if you want open() itself to avoid spelunking through NFS or spinning up optical drives or whatnot, before io_uring you simply had no way to tell that to the kernel—you call open*() or perhaps creat(), which must give you a fd, thus must block until they can do so.
(As far as I’ve seen, tutorial documentation usually rounds this down to “you can’t do nonblocking I/O on disk files”.)
Sort of. In the strictest sense, yes - the cq is a ring buffer (implemented with fancy atomic stuff), so you have to check if there is a completion on the queue before you read the entry. However, this doesn't need a syscall to do polling, if more completions come in while you're processing, they will be available to you.
There's also a syscall (io_uring_enter) that will do a context switch and wake you up when completions are available (it's a complicated syscall, that has a lot of knobs and switches and levers - just be ready for a LOT of information if you go read the man page).
Our general take was also that it has a lot of potential, but is relatively low level that most mainstream programmers aren't going to pay attention to it. Hence, it'll be a while before it permeates through various ecosystems.
For those of you that like to listen on the way to work, we cover io_uring on our podcast, The Technium.
https://www.youtube.com/watch?v=Ebpnd7rPpdI
https://open.spotify.com/episode/3MG2FmpE3NP7AK7zqQFArE?si=s...