I didn't say it was easy, I said the difficulty (nearly impossible, according to...

isityouyesitsme · on Dec 8, 2022

I think you are missing the point of an OOM kill. The intention is to keep the SYSTEM healthy, not any one application. Let me assure you, system-wide, this policy is sane. Do you have any idea what happens when the kernel is not able allocate when it needs to? Bad things.

This is different from whether applications handle this reality well (by and large, they do not). You are 100% correct with this. As is the poster who says fixing it is next to impossible (given the wide amount of deployed, critical software). I think if you actually dive into this, you will find Windows does not handle this as gracefully as you want to believe it does.

Everyone says "turn of overcommit" as if it is a general solution. This works only when you have tight control of what runs on the system, and it is tailored for a specific application or deployment that behaves well. And guess what? If this is you, then you already know how to disable overcommit in the kernel build for the custom OS you are building for your product. I will leave it as an exercise for the reader to determine if the policy of crashing the current process is a better strategy than what the OOM killer does (it favors to reap shorter-lived and larger-RSS processes).

Linux is not Windows. People get upset when they first realize this. They want different defaults (nevermind that they can customize it however they want to, and it isn't as though this is some esoteric kernel topic buried in the lkml that should surprise anyone).

If anything, I'd say that Linux needs easier ways to securely load a different set of code for running at OOM time.

AnIdiotOnTheNet · on Dec 8, 2022

> The intention is to keep the SYSTEM healthy, not any one application

Right, but it is only necessary because Linux lies to the application. If Linux instead reserved memory for itself to remain responsive and properly allowed allocations to fail when they would exceed the available resources, the applications that ask for memory that is unavailable would be the ones having to deal with the consequences rather than the OS having to kill processes based on some heuristic.

> I think if you actually dive into this, you will find Windows does not handle this as gracefully as you want to believe it does.

Personally, when I've seen modern Windows systems grind to a halt it is because of disk IO saturation, not memory saturation. Maybe I'm just not in the right contexts to see it.

> You are 100% correct with this. As is the poster who says fixing it is next to impossible (given the wide amount of deployed, critical software).

Ok, so that may be speaking at cross purposes then. I mean on an individual application level this is a not-too-hard solvable problem (usually). If they mean "given the current state of the ecosystem, it is globally nearly impossible" then I can see where they are coming from, but again I have to wonder what the point of having all this software be FOSS is if we can't work towards a goal like this.

> Linux is not Windows.

Yes, and in many ways I find this unfortunate. Good ideas are good ideas regardless of which OS they come from, fanboism is stupid.

> They want different defaults

Because defaults matter. As mentioned, because of the default applications have been written a certain way over years and years and now it is not a simple matter of disabling overcommit because of what happens to that software when you do that. That's why I proposed a solution based on opt-in.

isityouyesitsme · on Dec 8, 2022

It is in fact not merely necessary because the OS will overcommit. Even without overcommit, you hit the same problem, except you have no way to decide how to handle it---the application requesting the allocation suffers.

I guess my entire point with respect to defaults is that for anyone who knows what they're doing and the defaults do not work for them, they already have a huge pile of systems design and architecture work to do, and the mechanics of changing this OOM policy for that is a trivial change.

For people who do not know what they're doing and want the OS to do the hard stuff for them, there is no sane default. There are tradeoffs that will make large numbers of people unhappy. It is a no-win situation.

isityouyesitsme · on Dec 8, 2022

I missed one question.

I don't understand why you believe that forcing the process that is requesting the allocation to deal with it is the best thing generally. From a systems perspective, it is actually worse because there's a stronger chance you're killing a nice, stable process that might be critical simply because it lost the lottery.

The OOM killer strategy is to try to reap short lived processes that allocated a lot. From a systems perspective, this is clearly better... though as I said it might be better if you could more easily modify the OOM heuristic.

AnIdiotOnTheNet · on Dec 9, 2022

> I don't understand why you believe that forcing the process that is requesting the allocation to deal with it is the best thing generally.

Because it is in the best position to understand the consequences of failure and the options for dealing with it.

> From a systems perspective, it is actually worse because there's a stronger chance you're killing a nice, stable process that might be critical simply because it lost the lottery.

If the process is truly critical, it should be designed to stay stable in the event it can't allocate memory.