I have the opposite experience, working in embedded (C, not Rust...). Building a synchronous API on top of an async one is hell, and making a blocking API asynchronous is easy.
If you want blocking code to run asynchronously, just run it on another task. I can write an api that queues up the action for the other thread to take, and some functions to check current state. Its easy.
To build a blocking API on top of an async one, I now need a lot of cross thread synchronization. For example, nimBLE provides an async bluetooth interface, but I needed a sync one. I ended up having my API calls block waiting for a series of FreeRTOS task notifications from the code executing asynchronously in nimBLE's bluetooth task. This was a mess of thousands of lines of BLE handling code that involved messaging between the threads. Each error condition needed to be manually verified that it sends an error notification. If a later step does not execute, either through library bug or us missing an error condition, then we are deadlocked. If the main thread continues because we expect no more async work but one of the async functions are called, we will be accessing invalid memory, causing who knows what to happen, and maybe corrupting the other task's stack. If any notification sending point is missed in the code, we deadlock.
"If you want blocking code to run asynchronously, just run it on another task."
This highlights one of the main disconnects between "async advocates" and "sync advocates", which is, when we say something is blocking, what, exactly, is it blocking?
If you think in async terms, it is blocking your entire event loop for some executor, which stands a reasonable chance of being your only executor, which is horrible, so yes, the blocking code is the imposition. Creating new execution contexts is hard and expensive, so you need to preserve the ones you have.
If you think in sync terms, where you have some easy and cheap ability to spawn some sort of "execution context", be it a Haskell spark, an Erlang or Go cheap thread, or even are just in a context where a full OS thread doesn't particularly bother you (a situation more people are in than realize it), then the fact that some code is blocking is not necessarily a big deal. The thing it is blocking is cheap, I can readily get more, and so I'm not so worried about it.
This creates a barrier in communication where the two groups don't quite mean the same thing by "sync" and "async".
I'm unapologetically on Team Sync because I have been programming in contexts where a new Erlang process or goroutine is very cheap for many years now, and I strongly prefer its stronger integration with structured programming. I get all the guarantees about what has been executed since when that structured programming brings, and I can read them right out of the source code.
“This highlights one of the main disconnects between "async advocates" and "sync advocates", which is, when we say something is blocking, what, exactly, is it blocking?”
When I have work on a cooperatively scheduled executor for optimal timing characteristics. Sending work/creating a task on a preemptive executor is _expensive_. Furthermore, if that blocking work includes some device drivers with interactions with hardware peripherals, I can’t reasonable place that work on a new executor without invalidating hardware timing requirements.
Threads and executors can be infeasible or impossible to spawn. I have 1 preemptive priority and the rest are cooperative on bare metal. I can eat the suboptimal scheduling overhead with a blocking API/RTOS or I need the async version of things.
You can have more than one executor in some implementations (you know you do if you can keep more than one CPU busy), but in all the implementations you're certainly causing problems if you sync-block an executor for long periods of time, that's not what the system is for. By contrast, you want to open a thread/goroutine/Erlang process/Haskell spark/execution context in a threaded language and have it just bang away on numeric computation for a few days? Usually that's just fine, though if you've got a green-threading implementation, double-check first. Most green thread implementations seem to go through a period where that is not fine, but then eventually it is; Go for instance passed through that quite a few versions ago.
To add: This issue isn't that troublesome in languages like JavaScript where the runtime is event driven out-of-the-box (synchrounous APIs are considered the exception and suffixed with 'Sync' by convention), but it will become difficult to reason about in languages like rust and python where an event driven runtime is optionally choosen by the developer. Because now it's not enough to conceptually grep all awaiting calls to reason about any blocking code, but you'd also need to know if any other synchronous calls are implicitly awaiting in an underlying layer.
I like your take, and I propose there is another kind of division: function "coloring". An async "effect" type (e.g. futures, or even just 'async function') signals to programmers that there is some concurrency stuff going on around the place(s) you use it and that we need to talk about how we're going to handle it. i.e. rather than a performance issue, it's a correctness/safety/semantics issue.
> it is blocking your entire event loop for some executor
In the browser, only. Which is where all this nonsense began.
Browser didn't have threads (Win16) so they invented a crappy
cooperative scheme with callbacks, which became gussied up as "async" and now has a cult following. It was all a hack designed to make a terrible runtime environment usable. And now we have a priesthood telling us this is how to program, and if you disagree then you must be stupid.
You're correct that we should be using CSP (Erlang, Go, Occam, Actors).
It's not completely to blame on the browsers. We had the concept of green threads [1] well before that, albeit for equally dubious reasons: OS threads were believed to be too slow, and so green threads were composed onto OS threads. It seems a better choice here would be to try and improve OS threads, but the speed at which various program languages and libraries develop well outstrips anything we could achieve with the kernel, and so maybe this was inevitable :)
On the topic of improving OS threads, there are scheduler activations and similar mechanisms in some microkernels. Basically sending events for all hardware things and context switches, making user level threads as flexible as kernel threads. Unfortunately they haven't become mainstream.
Note that Erlang processes are essentially green threads within OS thread, but the Erlang process manager is preemptive (unlike most green thread managers). There is room for improvement in both directions.
Making an asynchronous task into a synchronous task is easy in exactly one scenario: when there is no pre-existing event loop you need to integrate with, so the actual thing the synchronous task needs to do is create the event loop, spin it until it's empty, and then continue on its merry way. Fall off this happy path, and everything is, as you say, utterly painful.
In the opposite direction, it's... always easy (if not entirely trivial). Spin a new thread for the synchronous task, and when it's done, post the task into the event loop. As long as the event loop is capable of handling off-thread task posting, it's easy. The complaint in the article that oh no, the task you offload has to be Send is... confusing to me, since in practice, you need that to be true of asynchronous tasks anyways on most of the common async runtimes.
This response just highlights how large the difference between different domains using the same construct is, and why that makes it impossible for people to agree on basically anything.
Rust async is, to a significant degree, designed to also work in environments that are strictly single-threaded. So "Spin a new thread" is just an operation you do not have. Everything that's weird and unwieldy about the design follows from that, and from how it also requires you to be able to allocate almost nothing, and to let the programmer manage the allocations.
I have pointed it out before that the overall design is probably made worse for general use by how it accommodates these cases.
I have the same experience, I like splitting my embedded C microcontroller peripheral drivers into 3 layers:
- header files with registers addresses and bitmasks
- asynchronous layer that starts transactions or checks transaction state or register interrupt handler called when transaction changes states
- top, RTOS primitives powered, blocking layer which encapsulates synchronization problems and for example for UART offers super handy API like this:
status uart_init(int id, int baudrate)
status uart_write(int id, uint8_t* data, int data_len, int timeout_ms)
status uart_read(int id, uint8_t* buf, int buf_len, int timeout_ms, int timeout_char_ms)
Top, blocking API usually covers 95% use cases where business logic code just want to send and receive something and not reinvent the synchronization hell
> If you want blocking code to run asynchronously, just run it on another task
What kind of embedded work are you doing exactly? Linux "soft" embedded or MMUless embedded? I don't have infinite NVIC priority levels to work with here... I can't just spin up another preemptively scheduled (blocking) task without eating a spare interrupt and priority level.
Otoh, I can have as many cooperatively scheduled (async) tasks as I want.
Also, at least in Rust, it's trivial to convert nonblocking to blocking. You can use a library like pollster or embassy-futures.
MMUless embedded with FreeRTOS. For example, at one point, we did not want connecting over TCP to block our command handler, so created a task that waited for a notification, and connected on that task, and went back to waiting for another notification. Though we ended up combining some tasks' responsibilities to reduce the amount of total stack space we needed.
The trick with synchronization/parallelism systems is to only communicate over a known yield point this is normally done via queues. It is the only way you get deterministic behavior from your sub-systems or multi-threaded environments.
You can spawn async task open a channel and wait for async task to push to it from blocking context, channels can have efficient ways to wait for the message. This is fairly easy to do in rust
If you want blocking code to run asynchronously, just run it on another task. I can write an api that queues up the action for the other thread to take, and some functions to check current state. Its easy.
To build a blocking API on top of an async one, I now need a lot of cross thread synchronization. For example, nimBLE provides an async bluetooth interface, but I needed a sync one. I ended up having my API calls block waiting for a series of FreeRTOS task notifications from the code executing asynchronously in nimBLE's bluetooth task. This was a mess of thousands of lines of BLE handling code that involved messaging between the threads. Each error condition needed to be manually verified that it sends an error notification. If a later step does not execute, either through library bug or us missing an error condition, then we are deadlocked. If the main thread continues because we expect no more async work but one of the async functions are called, we will be accessing invalid memory, causing who knows what to happen, and maybe corrupting the other task's stack. If any notification sending point is missed in the code, we deadlock.