I stopped using earlyoom because I got frustrated with it always seeming to be exactly the wrong thing for a given moment that would be killed. Now though OOM events seem to turn into situations that require me to force a reboot, because kswapd goes crazy, using all available CPU, and I can barely (if at all) register any keyboard/mouse input.
Seems like a bad reason to disable swap, but I can't find a way to do what I really want, which is simply to reserve some CPU for input. And then I suppose in order to do something useful with it, require that any process that wants it be allowed at least some small allocation. Maybe that's the hard part and why I haven't found a way? It doesn't happen often, it's just really annoying when it does.
In theory this should be possible with cgroups. The mechanism is there but I don’t know of a way to easily set up a policy that does what you want.
It should be possible to allocate say 95% of the system resources to the default cgroup and then you could create a secondary cgroup — recovery — which has access to the last 5% of the system resources which you could use to run commands such as “kill” or “top” to recover the system state.
Additionally you could run a second ssh server in the recovery cgroup which you could ssh into in the case of system lock up.
In reality it is probably easier to just reboot in most cases, or if you are dealing with servers use ipmi.
I don't understand why this issue still persists on linux. As far as I can tell all earlyoom needs to do is kill the process that has eaten the most memory in the last minute or so. On windows this issue is non-existent.
It's not that simple. Imagine something leaking memory running on parallel with something bursty. For example your browser leaks, but you run a big grep|sort in the background. Or have some GC runtime which allocates in batches and just decided it needs another chunk to manage.
On Windows you don't have oom at all because it trades that solution for just swapping forever until either you can't do anything or manage to kill the right app yourself.
People have reported that their machines with small amount of RAM are now fully usable where previously the system become completely unresponsive when swapping started.
I know; I did; I just didn't find that the answer to 'what would I ideally like/not like killed' was the same on a per-binary basis - it varied, and it'd always, by Sod's law, be wrong. e.g. if Firefox was set not to be killed, it would be a tab misbehaving; if Slack was allowed to be killed, it would be while I was mid-message, and so on.
If it had some concept of 'in-use', for which you could define rules like 'has an active window' or 'is playing media', that might work better for me.
Could the window manager be configured to communicate with this mechanism? The window manager knows what windows you're manipulating at the moment. (I imagine that terminal processes would be somewhat more complicated to handle.)
Seems like a bad reason to disable swap, but I can't find a way to do what I really want, which is simply to reserve some CPU for input. And then I suppose in order to do something useful with it, require that any process that wants it be allowed at least some small allocation. Maybe that's the hard part and why I haven't found a way? It doesn't happen often, it's just really annoying when it does.