The notion of fire-and-forget is itself the problem. Even with threads, you should have them join the main thread before the program exits. Which implies you should hold strong references to them until then. Most people don't go out of their way to do this even when they're able to, but that's what you're supposed to do.
I came here to write this comment. Also, you usually need to have some means of canceling the task -- otherwise you have to wait for them to finish, or you leak these stray lost tasks that are doing stuff, like manipulating the state of things.
Let's say I write a task that updates a progress bar as an infinite loop, and let it be gc'ed on program exit, without ever joining it. What's wrong with that design? I can, of course, modify the task to check a flag that indicates program completion, and exit when it's set. But does this extra complexity help the code quality in any way?
Or suppose I spawn a task to warm up some cache (to reduce latency when it's used). It would be nice if it completes before the cache is hit, but surely not at the cost of blocking the main program. I just fire-and-forget that task. If it executes only after the cache was hit, it will realize that, and become a no op. Why would I want to join it at the end? It may not be free (if the cache was never hit, why would I want to warm it up now that the main program is exiting?).
I might not have an answer you'll find convincing, since this is somewhat subjective, as the "should" here isn't based on a "need". The best analogy I can give here is that you don't "need" to, for example, avoid circular references in Python. Some believe it's better design to avoid them if you can. I find value in making things, say, more predictable and deterministic when possible.
Like what, for example? Well, one very simple yet practical one is that when you break into your program with a debugger, you want to minimize noise - any thread or object that's alive unnecessarily is, at the very minimum, extra noise for you to deal with, and at worst, extra surface for a bug to creep in. Moreover, the liveness of the thread/object could provide you with a vital bit of information that you otherwise wouldn't get. Another one is the fact that it lowers the number of obstacles you'll have in the future if you ever want to do something less common - such as suspending a GC, snapshotting the program state, or any number of less common things. Yet another one is the very fact that following the pattern more broadly helps you and future maintainers avoid pitfalls that arise in similar abstractions, like they did here. I could go on, but all of these concerns are basically a bunch of things whose values mostly lie in the potential future, not the present.
There are many other more practically-minded folks who believe the presence of a GC exempts them from caring about such concerns, and see these as adding extra complexity. If you see it that way, I don't have a compelling rebuttal. But if "complexity" is your criterion, perhaps what I can offer is that you can also view it from the opposite standpoint: following the fork-join pattern (or avoiding circular references, etc.) itself avoids complexities that arise from not doing so [1], such as those in the previous paragraph. It's just that not every form of complexity or cost materializes immediately.
[1] Note that complexity is not just a measure of code size, but also the deviation of its behavior from expectation. You can make code more complex to reason about merely by deleting some lines, and that could include a thread.join() call.
You make very good arguments in favor of joining threads in most cases, and I completely agree with you. Perhaps the only disagreement we (may?) have is that I think these arguments may not apply in some cases.
In my first example, I would probably find not joining the thread cleaner than joining (since it would require extra code to rewrite the infinite loop into something joinable, and since the earliest time I can join is at the very end of the program anyway).
In my second example, your arguments are persuasive. It is very likely that there is a place in the program where the cache warming is no longer a good idea (for example once the real traffic started hitting the cache, it's probably too late; in fact warming up at that stage is probably a bug, since it may divert resources from serving the actual user traffic). So yes, in my second example, I now think it's better to either join or cancel the task.
Thanks! Regarding your first example, I'm not entirely sure I understand it. If you have a task that updates a progress bar as an infinite loop... does that mean the task never finishes? What does "progress" even mean for something that goes on infinitely long? What happens if the GUI is destroyed in the middle of that thread's lifetime (which it will be by the main thread, if the secondary thread runs forever)? How many threads do/should you end up with if you later realize you want to run your program itself (i.e. your main()) function multiple times?
The main cases I can think of where joining might not make sense is when you simply don't have the capability to do so in a reasonable manner, like when the main thread is in third-party code that you have no control over. Otherwise, if I understand the example correctly, you absolutely need to join such a thread - and not merely at program exit, but sometime before the GUI is destroyed.
Conveniences like this library and other threading libraries make it easy for people to trivialize something (concurrent programming) that ought not be.
This. Even if you hold a reference to the task, your program very likely has a bug. At some point you should always await it to see if it failed or not.
It's easy to miss this if you observe completion via a side-channel, for example item removed from a queue. But this is also a bad way to write tasks in the first place, let them return meaningful data rather than mutate shared objects. That way you are forced to await them and your code becomes much more straightforward. It's counter-intuitive at first if you think in threads, because there you are more used to worker pools and such, whereas asyncio tasks can be written in a more linear way and don't background workers to the same extent.
After having done this mistake several times, I've concluded one should almost never use create_task. It's much better to place them into a top-level list of background tasks, that is always awaited, using this method they are both started automatically and always awaited for errors appropriately.
> At some point you should always await it to see if it failed or not.
What you're sayin is correct, but doesn't quite imply what I'm saying. I'm saying everything that you spawn asynchronously (be they threads, tasks, whatever) needs to be joined - even if they're no-ops whose success or failure is irrelevant. This is similar to how you should always make sure to deallocate memory that you dynamically allocate whenever you can, as a matter of good practice and good hygiene. Sometimes you can get away with not doing so, but you shouldn't really skip it unless you don't have a choice, as it makes the program logic clearer and can make the program more robust too. (e.g., imagine running your main() in a loop where threads are spawned each time but never guaranteed to join.)
one of the reasons i was never "bitten" by this bug: whenever i use python tasks, i save them in a collection so i can cancel all of them when the program is ready to quit.