The big one not mentioned is cancellation. It's very easy to cancel any future. ...

Matthias247 · on March 25, 2024

It's not easy to cancel any future. It's easy to *pretend* to cancel any future. E.g. if you cancel (drop) anything that uses spawn_blocking, it will just continue to run in the background without you being aware of it. If you cancel any async fs operation that is implemented in terms of a threadpool, it will also continue to run.

This all can lead to very hard to understand bugs - e.g. "why does my service fail because a file is still in use, while I'm sure nothing uses the file anymore"

pornel · on March 25, 2024

Yes, if you have a blocking thread running then you have to use the classic threaded methods for cancelling it, like periodically checking a boolean. This can compose nicely with Futures if they flip the boolean on Drop.

I’ve also used custom executors that can tolerate long-blocking code in async, and then an occasional yield.await can cancel compute-bound code.

aidenn0 · on March 25, 2024

If you implemented async futures, you could have also instead implemented cancelable threads. The problem is fairly isomorphic. System calls are hard, but if you make an identical system call in a thread or an async future, then you have exactly the same cancellation problem.

pornel · on March 25, 2024

I don't get your distinction. Async/await is just a syntax sugar on top of standard syscalls and design patterns, so of course it's possible to reimplement it without the syntax sugar.

But when you have a standard futures API and a run-time, you don't have to reinvent it yourself, plus you get a standard interface for composing tasks, instead of each project and library handling completion, cancellation, and timeouts in its own way.

Night_Thastus · on March 25, 2024

I don't follow how threads are hard to cancel.

Set some state (like a flag) that all threads have access to.

In their work loop, they check this flag. If it's false, they return instead and the thread is joined. Done.

ric2b · on March 26, 2024

So you make an HTTP request to some server and it takes 60 seconds to respond.

Externally you set that flag 1 second into the HTTP request. Your program has to wait for 59 seconds before it finally has a chance at cancelling, even though you added a bunch of boilerplate to supposedly make cancellation possible.

Night_Thastus · on March 29, 2024

If the server takes 60 seconds to respond, and you need responses on the order of 1 second, I'd say that is the problem - not threads.

jeffbee · on March 25, 2024

Cancellation is not worth worrying over in my experience. If an op is no longer useful, then it is good enough if that information eventually becomes visible to whatever function is invoked on behalf of the op, but don't bother checking for it unless you are about to do something very expensive, like start an RPC.

pornel · on March 25, 2024

It's been incredibly important for me in both high-traffic network back-ends, as well as in GUI apps.

When writing complex servers it's very problematic to have requests piling up waiting on something unresponsive (some other API, microservice, borked DNS, database having a bad time, etc.). Sometimes clients can be stuck waiting forever, eventually causing the server to run out of file descriptors or RAM. Everything needs timeouts and circuit breakers.

Poorly implemented cancellation that leaves some work running can create pathological situations that eat all CPU and RAM. If some data takes too long to retrieve, and you time out the request without stopping processing, the client will retry, asking for that huge slow thing again, piling up another and another and another huge task that doesn't get cancelled, making the problem worse with each retry.

Often threading is mixed with callbacks for returning results. The un-cancelled callbacks firing after the other part of the application aborted an operation can cause race conditions, by messing up some state or being misattributed to another operation.

jeffbee · on March 25, 2024

Right, this is compatible with what I said and meant. Timeouts that fire while the op is asleep, waiting on something: good, practical to implement. Cancellations that try to stop an op that's running on the CPU: hard, not useful.