We're moving away from swap partitions on our Linux servers

VLM · on Oct 15, 2022

I always felt nervous about resizing a filesystem that has a swap file on it. Never had a kernel panic but would swapoff the file just in case.

If I created a swap partition in LVM I can force it and move it around to the best location WRT multiple hard drives and possible mix of HDD and SSD. With a file its a little more abstract and I could emulate it with some pointless abstraction, but why bother. A typical "this century" example would be running my boot and swap on local disk and storing all real world data on the NAS over iSCSI. Obviously now that "Everything" is virtualized this kind of system administration is void unless you're admin on the cluster itself LOL.

The problem with swap is memory seems to have gotten cheaper than storage and swap of any form takes storage space. Its not the capex that's the problem, its the opex of now you've got an extra 32GB that "has to" be backed up and "has to" be virus scanned or whatever security theatre, and that swap file "has to" be treated as the highest security PII HIPPA PCI category because who knows whats been swapped out of memory onto it, it could be chock full of CC numbers how do you know for sure unless its empty or not there? Paying for more memory doesn't increase your backup storage / security risk / virus scan times and the latter costs more than the former. I don't think non-cloudy admins realize there are cloudy admins out there with forests of little systems with like 8 GB of ram and 16 GB of storage, so the "old" rules from the 1980s about provision twice your ram as swap would increase the disk required from 16 GB to 32 GB doubling my costs for, essentially, nothing. Another problem with swap is in the old days if you needed more memory than you have physically you had to buy more memory, then for awhile you could emulate memory very slowly with disk but much faster than calling an IBM CE and buying more memory, now if you need more memory you spin up more instances or rebuild the instance on a larger flavor about as fast as how you'd add swap in the olden days, probably faster and it can be done dynamically in some architectures.

tatref · on Oct 15, 2022

Do you also feel nervous when resizing a FS that has files in use?

I've never thought about the AV scan of such a file!

CoolCold · on Oct 15, 2022

For growing FS, with regular files, I'm not feeling nervous. I was at the beginning like 15 years ago, but both XFS/Ext4/BTRFS works fine.

As side note, not nervous when resizing NTFS in Windows, even for shrinking (still recommend to have backup for shrinking though).

xani_ · on Oct 15, 2022

> The problem with swap is memory seems to have gotten cheaper than storage and swap of any form takes storage space. Its not the capex that's the problem, its the opex of now you've got an extra 32GB that "has to" be backed up and "has to" be virus scanned or whatever security theatre, and that swap file "has to" be treated as the highest security PII HIPPA PCI category because who knows whats been swapped out of memory onto it,

Why on earth you'd backup swap?

Also encrypted swap is pretty easy to setup, then again most distros don't give that option on install.

We have tiny swap, like 1GB, working basically as "early warning" of "hey, this server probably could use few GBs more of RAM". There is very little use for any more swap than that

>I don't think non-cloudy admins realize there are cloudy admins out there with forests of little systems with like 8 GB of ram and 16 GB of storage, so the "old" rules from the 1980s about provision twice your ram as swap would increase the disk required from 16 GB to 32 GB doubling my costs for, essentially, nothing.

The "old rules" for swap were never relevant. But dumb myths die hard.

viraptor · on Oct 15, 2022

> The "old rules" for swap were never relevant.

They did make sense in some contexts. Specifically for laptops expected to hibernate you need to preserve the whole ram. Then you need the extra size for whatever your swap was already using. For small ram systems, 2x the memory size was a good rule of thumb.

It stopped making sense when we stopped using swap and filing memory in normal situations.

drewg123 · on Oct 15, 2022

Traditional *nix and BSD systems use swap partitions for crashdumps, so giant swap partitions made sense before those systems supported mini dumps. Admins that grew up with those systems are used to the "2xram" rule of thumb.

I just yesterday had to re-partition a FreeBSD kernel test box with 256GB of RAM and a 2GB swap partition because I desperately needed a crashdump and the mini dump would have been 7GB.

wlonkly · on Oct 15, 2022

> Why on earth you'd backup swap?

The quotes around "has to", and especially the virus scan, makes me think there's a blanket corporate policy that applies to all files, regardless of file type.

OJFord · on Oct 15, 2022

I stopped using earlyoom because I got frustrated with it always seeming to be exactly the wrong thing for a given moment that would be killed. Now though OOM events seem to turn into situations that require me to force a reboot, because kswapd goes crazy, using all available CPU, and I can barely (if at all) register any keyboard/mouse input.

Seems like a bad reason to disable swap, but I can't find a way to do what I really want, which is simply to reserve some CPU for input. And then I suppose in order to do something useful with it, require that any process that wants it be allowed at least some small allocation. Maybe that's the hard part and why I haven't found a way? It doesn't happen often, it's just really annoying when it does.

__turbobrew__ · on Oct 15, 2022

In theory this should be possible with cgroups. The mechanism is there but I don’t know of a way to easily set up a policy that does what you want.

It should be possible to allocate say 95% of the system resources to the default cgroup and then you could create a secondary cgroup — recovery — which has access to the last 5% of the system resources which you could use to run commands such as “kill” or “top” to recover the system state.

Additionally you could run a second ssh server in the recovery cgroup which you could ssh into in the case of system lock up.

In reality it is probably easier to just reboot in most cases, or if you are dealing with servers use ipmi.

jaggirs · on Oct 15, 2022

I don't understand why this issue still persists on linux. As far as I can tell all earlyoom needs to do is kill the process that has eaten the most memory in the last minute or so. On windows this issue is non-existent.

viraptor · on Oct 15, 2022

It's not that simple. Imagine something leaking memory running on parallel with something bursty. For example your browser leaks, but you run a big grep|sort in the background. Or have some GC runtime which allocates in batches and just decided it needs another chunk to manage.

On Windows you don't have oom at all because it trades that solution for just swapping forever until either you can't do anything or manage to kill the right app yourself.

gmokki · on Oct 16, 2022

I think it has now been finally fixed in Linux 6.1: https://www.phoronix.com/news/Linux-MGLRU-v9-Promising

People have reported that their machines with small amount of RAM are now fully usable where previously the system become completely unresponsive when swapping started.

FeepingCreature · on Oct 15, 2022

You can configure earlyoom to tell it which binaries to preferentially kill.

OJFord · on Oct 15, 2022

I know; I did; I just didn't find that the answer to 'what would I ideally like/not like killed' was the same on a per-binary basis - it varied, and it'd always, by Sod's law, be wrong. e.g. if Firefox was set not to be killed, it would be a tab misbehaving; if Slack was allowed to be killed, it would be while I was mid-message, and so on.

If it had some concept of 'in-use', for which you could define rules like 'has an active window' or 'is playing media', that might work better for me.

jhgb · on Oct 15, 2022

Could the window manager be configured to communicate with this mechanism? The window manager knows what windows you're manipulating at the moment. (I imagine that terminal processes would be somewhat more complicated to handle.)

charcircuit · on Oct 15, 2022

Yes, android already has this functionality where foregrounded apps have a lower priority to be killed by lmk.

tmtvl · on Oct 15, 2022

Funny that, Linux internal oom killer has a setting for that (oom.kill_allocating_task or something in sysctl), silly that earlyoom doesn't have that.

I have switched to systemd-oomd, but haven't yet gotten in any notable scrapes, so can't comment yet on how it fares.

wongarsu · on Oct 15, 2022

Windows has been fine with swap files since the age of spinning disks, so I guess using a swap file instead of a swap partition is fine.

Speaking of Windows: How is it that Linux seems to grind to a halt when it is forced to swap, while Windows works just fine in the same scenario? Sure, on Windows a lot of swap file "use" is just bookkeeping of empty pages because Windows doesn't overcommit memory. But even if it actually swaps it stays responsive, while I've had to reboot linux boxes more than once because they stayed unresponsive after memory pressure. Does Windows have a smarter swap algorithm? Is it more proactive about swapping stuff back in once memory pressure is gone? A better scheduler that gives more precedence to "interactive" things? And most importantly: can linux implement those things too?

hulitu · on Oct 15, 2022

> How is it that Linux seems to grind to a halt when it is forced to swap

/proc/sys/vm/swapiness or something like this. AFAIK shall be bigger than 50% to not see the behaviour that you describe.

_c5eq · on Oct 15, 2022

Ah yes, swapiness. I think Windows is set to something like ... 65% or so. Or was it 85%. I forget, but I looked into this a long time ago when I was having to suffer through having about half as much ram as was ideal for my use case scenarios on both windows and linux. I remember coming across some method to get it changed on windows as well, but I forget that method too. Or rather, I think it's some reg edit that needs doing, but I forget what/where.

simoncion · on Oct 15, 2022

Based on the explanation in the documentation <https://docs.kernel.org/admin-guide/sysctl/vm.html?highlight...>, I set 'swappiness' to 100. I figure that SSDs are fine with random IO, so swap isn't likely to be much -if at all- slower than pulling pages back in from disk.

xani_ · on Oct 15, 2022

Wrong direction. Bumping swappiness up means OS will make it more likely to keep more pages in cache, setting it to higher value makes OS treat it more like RAM and that's usually slow and jittery

derkades · on Oct 15, 2022

Using more swap may make the system faster. It means there is more room for file-backed pages in RAM, to cache read and writes. A swappiness value should be chosen based on the system IO workload and swap device performance characteristics.

marcosdumay · on Oct 15, 2022

Windows swap never worked well. The OS was always famous for trashing constantly, and only started to lose that fame when computers started coming with way more RAM than any sane OS could use.

timbit42 · on Oct 15, 2022

> trashing

Freudian slip?

yellowapple · on Oct 15, 2022

I think the correct Windows terminology would be Recycle Binning.

phendrenad2 · on Oct 15, 2022

> The OS was always famous for trashing constantly

Famous, huh? Do you have any resources or articles where I can read about this famous phenomenon?

fragmede · on Oct 15, 2022

The word is thrashing.

https://en.wikipedia.org/wiki/Thrashing_(computer_science)

https://superuser.com/questions/32860/how-to-reduce-disk-thr...

https://answers.microsoft.com/en-us/windows/forum/all/why-is...

https://serverfault.com/questions/74822/how-do-i-tell-if-my-...

phendrenad2 · on Oct 16, 2022

These links look like you just googled "windows thrashing" and copied the first 4 results. Did you do that?

Addressing your links in order:

1. I know that the word is thrashing, I was quoting the OP.

2. The existence of someone who thinks they experienced thrashing (but didn't do any investigation to see if they were actually experiencing thrashing), 13 years ago, doesn't seem very difinotive to me.

3. Please read this page. You'll see that the first sentence is "hard drive is being continuously thrashed even when no applications are apparently running". How is the system thrashing when there are no applications running? FYI, "thrashing" does not just mean ANY frequent hard drive reads for any reason.

4. I don't know what this article has to do with the question I asked. The article title is "How do I tell if my Windows server is swapping?" which seems completely unrelated to my question about how often Windows swaps?

pixl97 · on Oct 16, 2022

Eh, are you sure we're really comparing apples to apples here?, at least when it comes to server based applications. I've overloaded windows multiple times and yes I could move the mouse around and Mayne open notepad if it already cached in memory, but you could get stuck waiting for IOPS trying to do anything else for insanely long periods of time.

I've not done much in this realm since NVMe SSD has been common so not sure how this behaves these days.

kazinator · on Oct 16, 2022

The VirtualAlloc API in Windows provides explicit control over commitment. You can reserve address space using the MEM_RESERVE flag, and commit with MEM_COMMIT. These can be combined into one.

Therefore, we cannot say that "Windows doesn't overcommit". Windows applications can overcommit. You don't know which ones might do that, for what purpose.

vbezhenar · on Oct 15, 2022

We use servers with swap files. It works fine and does not grind to halt. Right now one server has 10/15 GB RAM utilization and 3.7/3.7GB swap utilization. It's pretty responsive.

teo_zero · on Oct 16, 2022

> Windows doesn't overcommit memory

So, what's the function of swap for Windows?

kazinator · on Oct 16, 2022

The function of swap isn't "to overcommit"; it is to provide more virtual memory than you have physical memory.

Overcommitting is a separate idea, which can be done without swap.

teo_zero · on Oct 16, 2022

Sorry but "provide more virtual resources than you physically have" isn't another way to say "overcommit"?

kazinator · on Oct 17, 2022

Virutal memory provides a larger memory than available RAM, but all of it can be backed by mass storage. In that case, you have all of the storage; just not all of it is RAM. When not all of it is backed by storage, then the situation is overcommitted.

xani_ · on Oct 15, 2022

Once you swap a lot out and you get back some free memory linux will swap it back in on demand, which means a lot of micro-lag moments once apps start accessing now-in-swap things.

Linux assumes it's better to have not-used thing in swap while RAM is used for IO caching, that's why you see this behaviour, there is (AFAIK) no mechanism to go "okay, we have few gigs free, let's bring stuff from swap back".

Reboot-less method is to add new LV with swap and swapoff the old one but that's more PITA than restart...

dschuetz · on Oct 15, 2022

Swap or no swap: https://haydenjames.io/linux-performance-almost-always-add-s...

It's the use cases that make a difference.

bloaf · on Oct 15, 2022

That is an article which

1. Makes claims about performance 2. Provides no benchmarks

The only concrete answer to "why have swap if I have enough RAM" is "well, if you run out of RAM..."

lazide · on Oct 15, 2022

If you use swap because you don’t have enough RAM for things you need in RAM, you’re going to have a very bad time.

The only time swap isn’t just delaying (and growing) the inevitable shitshow is when, for some reason:

1) some program is consuming some significant amount of Ram, but doesn’t need to run for awhile, and it’s somehow faster/easier to let it swap to disk than shutting it down and restarting it while you do something else that consumes Ram.

2) something you’re running has some amount of bloat or leak that you know will never get exercised, but also never will grow larger than your swap file and take your system out entirely.

Both of those situations are not only rare, but easy to misjudge and end up wedging your system.

Dylan16807 · on Oct 15, 2022

Chrome does both of those. It takes gobs of memory that won't be needed for many hours, and it takes gobs of memory that will never be needed but are roughly stable and won't overwhelm a suitable swap file.

leni536 · on Oct 15, 2022

I found that gcc works mostly alright on swap for compiling C++. I assume that data structures for template instantiations are never really deallocated, and it often falls into category 2 of your deacription.

And then compilation eventually finishes and the allocated memory is released on process termination.

guipsp · on Oct 15, 2022

There is a large class of programs that may allocate large amounts of memory, but only work on a small set at once

lazide · on Oct 15, 2022

Such as?

AnotherGoodName · on Oct 15, 2022

Disk cache for one.

Disk cache is really flexible at the OS level. Look up 'vmtouch' on Linux and play with some recently used files, you'd be surprised how many are in memory. If you don't have swap and only have 'X' amount of RAM free the program will still load and run fine, it'll just do so in a way that more directly thrashes the disk. If you add some swap the OS will page out some other memory that's not being used giving 'X+Y' RAM free for disk caching that the OS will use.

In fact this usage of free memory as disk cache is so ingrained that memory used this way doesn't even appear allocated. OSes are pretty much all designed to use whatever free RAM is available as disk cache. The disk cache shrinks if that RAM is actually needed.

Effectively a great way to think of swap is that your OS is always swapping, even without a swapfile! If you read a file there's swapping of that file into memory, it's even done using the standard memory paging that swap files do. Without a swap file you still have swapping but you've just removed the ability for the OS to write out memory that it thinks is less relevant than the files you're currently trying to read. Which is almost always a loss.

lazide · on Oct 15, 2022

That is temporary and evictable kernel memory usage. It also isn’t allocating more memory than it has while doing it, it evicts things when it runs out of unallocated memory.

Technically a program, but not what I think anyone typically considers one in this context?

What other ones are you aware of?

I ask because generally programs use RAM because it’s fast - otherwise they’d use something else. Having a program intentionally allocate more than is available would almost certainly result in serious performance issues it would have a hard time dealing with predictably - as compared to using disk for the ‘excess’, and then reading/loading what it needs.

AnotherGoodName · on Oct 16, 2022

Well simply put any allocation that's not really needed any time soon but is also not freed. Swap let's you use that.

In terms of what benefits from that most modern programs are actually surprisingly flexible in memory allocation. Anything with a garbage collection behind the scenes will flush more often if needed. Having swap to push the non needed programs out let's GC based programs keep allocations longer which is intended to enable reuse

guipsp · on Oct 15, 2022

Most programs actually. Compilers, databases, even your browser! I mean, you can argue that they should manage paging out manually, but it's so much easier to use swap and an unmodified program.

lazide · on Oct 16, 2022

None of those go out of their way to do so, and certainly not databases. Every database tuning documentation I've ever read says avoid swap at all costs.

Some just allocate when they need it until it doesn't work (cough Chrome cough), but that is a far, far throw from intentionally allocating more memory than the machine has, and none of them seem to do it to intentionally only work on a small subset at a time, even if de-facto that is what happens.

What sort of performance or reliability would you expect anyway from a database that intentionally loaded the entire database into memory first without even caring how much would fit?

guipsp · on Oct 16, 2022

I mean, at this point I feel like you're misinterpreting on purpose. I didn't say it would be faster or more reliable, just easier.

lazide · on Oct 16, 2022

No? I was providing an example.

Every database I’ve ever dealt with has fixed memory limits that get set.

Otherwise the database server will OOM even with swap if it doesn’t limit memory consumption and how much it loads, on any machine, with a given large enough database. And the size of the database is up to the user.

It’s a fundamental part of the problem.

Speed and reliability to some extent are literally core requirements of database servers, so any database software that doesn’t do it is going to have a bad time.

xani_ · on Oct 15, 2022

> The only concrete answer to "why have swap if I have enough RAM" is "well, if you run out of RAM..."

Linux with swap available can swap out never used pages (say a part of library that never gets called) and use freed memory to cache disk IO. We see that on every server that has enough data to fill the remaining RAM with page cache. But you only need like a gig of swap to take advantage of it.

AnotherGoodName · on Oct 15, 2022

Yep the lesson a lot of people who claim 'no swap is better' need to learn is that your OS is always swapping, even without a swap file. The lack of swapfile just removes some flexibility in how that's handled.

The OS will page in files that are being accessed regularly. Without a swap file you don't stop the OS swapping between memory and disk, that'd be pathological since disk access is that much slower.

Instead what not having a swapfile really means is that you are telling the OS "anything allocated ever on the system is more important than disk cache".

mediaserf · on Oct 15, 2022

Here is a great break-down of how linux swap actually works: https://www.linuxjournal.com/article/10678

bxparks · on Oct 15, 2022

I read somewhere that Linux hibernation requires swap (1-1.5X RAM?). So I have been configuring my laptops with a 1.5X swap partition. Unfortunately Linux hibernation has not worked for me since the early 2000's, across 10-15 different laptops. Apparently hibernation can use swapfiles, but I've never been able to verify that.

jinnko · on Oct 15, 2022

From the kernel docs, `image-size` section, which relates to the hibernation image size.

> Reading from it returns the current image size limit, which is set to around 2/5 of the available RAM size by default.

I have successfully been hibernating a laptop after a few hours in suspend with swap allocated 50% the size of my RAM, so 8GB when I had a 16GB laptop, and 16GB now on a 32GB laptop. Never had any issues.

Jnr · on Oct 15, 2022

10+ years ago I also had problems getting hibernation to work. Then I read the Archlinux wiki on that topic and haven't had problems since.

It is working fine on Linux, you just have to set it up correctly.

somehnguy · on Oct 15, 2022

If it’s seemingly as easy as reading a wiki and configuring a few things I’m curious why it doesn’t just work by default. Does it just not get enough dev love?

billyzs · on Oct 15, 2022

My personal experience is that the situation has improved somewhat but the hardware still plays a large part. A Kaby Lake laptop I had setup hibernation to encrypted swap (not file but partition iirc) worked well enough besides a few times when some peripherals e.g. wifi failed to power on after resuming from hibernation; A Zen3+ laptop from the same manufacturer that I got last week, running Arch (with the latest kernel and packages) _always_ had problem resuming, wifi (and who knows what else) never gets turned back on, the system sometimes enter into a weird state showing just a blank screen, not accepting any keypresses, or otherwise get stuck in some stage in the boot process with no output so it's hard to debug. So far it's been a real crapshoot.

OJFord · on Oct 15, 2022

I use a swapfile, and hibernate frequently (after iirc 2h of suspend, so sometimes multiple times a day). Does take a bit of setup (by far the most supportive of people arguing against Linux in favour of macOS/Windows thing I've done) but works well now it is.

netik · on Oct 15, 2022

Every commercial job I’ve had has had swap set to zero. We’d rather run out of memory than swap.

orev · on Oct 15, 2022

Swap space is used for many other things than just running out of memory. Given the huge number of comments on this post that get it wrong so badly, I'm not going to respond to all of them, so please do your own research.

The short answer is that not having swap causes the system to run with one hand tied behind its back, since it’s forced to store unused/rarely used stuff in RAM, taking up space that could be used more productively (like disk cache).

VectorLock · on Oct 15, 2022

Fail fast is in most situations better than swap hell.

whartung · on Oct 15, 2022

Indeed.

As a Java shop, the last thing you want to do is swap. Garbage Collection and Swap tend to be very bad cohabitants.

kevincox · on Oct 16, 2022

Yeah, no swap probtnakes sense for GCed languages. If you are going to touch all of the memory frequently it doesn't make sense to try to swap it out. You are better off forcing Linux to swap out filesystem cache which at least has a chance of not being used in the next minute.

The only real thing that you may be missing is SSH being swapped out and some other system processes but those may be 2MiB altogether.

Maybe an option would be disabling swap just for the JVM. IDK if that is something you can do with cgroups. But if 99% of your anonymous RAM is pinned anyways there is little to gain.

hulitu · on Oct 15, 2022

I bet it was never boring.

nomel · on Oct 15, 2022

In every commercial setting I’ve been involved with, something has gone very wrong if you’re swapping, since you spec the system for the use case, rather than a fraction of it. You’ll still have the logs to debug whatever went wrong. What problems do you see?

Arnavion · on Oct 15, 2022

Yes, exactly. If your machine is starting to use swap, then it's already running away on memory that you hadn't planned for. The swap just delayed the inevitable OOM. You might as well let it OOM immediately.

xani_ · on Oct 15, 2022

We use it as early alert to give more RAM. Just like a gig of swap per server regardless of RAM size.

iforgotpassword · on Oct 15, 2022

I tried swap on zram briefly, but then switched back to a swap partition and enabling zswap. It's basically like zram but when the reserved ram section runs full it will hit the disk with the lru pages. A tiered system. That means as long a you're only slightly using too much ram, it will swap in compressed RAM entirely.

So the only use case I see for swap in zram is if you never want to swap to disk, but still want to use swap for some reason.

transfire · on Oct 15, 2022

Swap on zram seems very interesting. My current home computer is a NUC with 16GiB, but I sometimes build from source and a tmpfs drive is helpful for that so I’m a little concerned I might not have quite enough RAM. Thoughts?

I’ll definitely double my RAM for my next system (in a year or two) and will definitely have to consider swap on zram then.

opencl · on Oct 15, 2022

Not having quite enough RAM is the exact scenario when swap on ZRAM is most useful. Assuming the data you have in RAM is compressible it lets you use a few extra gigabytes before hitting disk swap or OOM (depending on if you have a disk swap device configured in addition to ZRAM).

Zswap is somewhat similar and might also work well for you, it uses disk swap but with a compressed cache in RAM.

jakogut · on Oct 15, 2022

> I sometimes build from source and a tmpfs drive is helpful for that so I’m a little concerned I might not have quite enough RAM.

I used to do this as well, but consider that the VFS cache basically does the same thing. The main difference is that cached data can be dropped or flushed if necessary to free memory, whereas this can't be done automatically with tmpfs.

Benchmark both cases and I'll bet you'd be surprised at how little difference there is.

kevincox · on Oct 16, 2022

The nice thing about swap here is that the kernel is under no obligation to sync the data to disk "soon". I have lots of tmp files that are written and read quickly and never hit the disk. With tmpfs+swap the kernel is free to swap out unused files if that memory is better used elsewhere, but otherwise won't bother touching the disk.

I do a lot of builds on my system and most are small enough to entirely live in RAM. But when you build Firefox it is best to start flushing some data to disk.

jakogut · on Oct 25, 2022

That's another great point. With a large build in tmpfs, you can either run into the size limit of the tmpfs, breaking the build and wasting time, or run out of memory and end up with an unresponsive machine.

Entrusting this to the VFS cache has better predictability, and nearly the same performance in many cases.

nottorp · on Oct 15, 2022

I wonder if swap on zram is similar to what Mac OS calls "compressed memory".

95014_refugee · on Oct 15, 2022

No. ZRAM is a compressed ramdisk. Darwin “compressed memory” is an intermediate state for pages as they age out of the working set, but before they are written out to a swapfile.

Incidentally, Darwin creates, defragments and reclaims swapfiles dynamically. All this complaining about swap partition / file sizes is kind of … silly. None of that code is rocket science.

dtgriscom · on Oct 15, 2022

That's my understanding.

WirelessGigabit · on Oct 15, 2022

I wouldn’t do less than 64GB. I’m running 32 on my current NUC as my main machine with Windows 11 and WSL and a RAM drive. With current source sizes and Electron apps consuming RAM, just 32 GB is no longer a nice to have.

mixmastamyk · on Oct 15, 2022

I do fine on 16gb, with loaded browser and 4gb vm of Windows running.

WirelessGigabit · on Oct 16, 2022

GB*. I don’t. Docker containers and IDEs all over. I also don’t close stuff.

mixmastamyk · on Oct 16, 2022

Containers use much less ram than vms.

One can always use and write wasteful programs but it is a choice, rather than necessity as folks often portray.

koala_man · on Oct 15, 2022

I was surprised to learn that a swap file on hdd is not nearly as fast as a dedicated swap partition at the start of the drive.

https://www.vidarholen.net/contents/blog/?p=1110

fuzzfactor · on Oct 16, 2022

From the 1990's there was a perceptible difference in Linux or Windows if you could dedicate one entire secondary HDD to a swap partition.

At the beginning of the HDD was where the sectors are most rapidly accessed physically.

For SSD there should not be much difference between a separate partition or a file, or whether the reserved sectors the swap occupies are at the beginning of the drive. Should probably be aligned with the block size of the SSD though.

Still may be helpful to have a separate SSD for swap besides the drive the OS is on.

It would be worth trying but I like a swap file in Linux so I can confine Linux to a single type EXT4 partition right next to my NTFS WIndows partition.

With Windows it does work good without a swap file as long as you have plenty of memory left over after Windows and your desired apps are loaded. Handling further data loads within reason.

nixcraft · on Oct 15, 2022

The main problem is my little VM powered by CentOS or Amazon Linux 2 (which is based upon Fedora Linux) runs out of memory when I run "yum update". I have 1 core CPU with 512MB ram. The BaseOS repo from RHEL and co is just too big. It requires a minimum of 3 GB RAM. So work around is to add swap to AWS EC2/Lightsail instance. I believe there is a dnf bug open in RHN, but no progress was made so far https://bugzilla.redhat.com/show_bug.cgi?id=2040170 Adding a 4/8GB swap space file solved my problem with the `yum update`. That is just one example. There are many users with few resources.

Arnavion · on Oct 15, 2022

It may require more than 512MiB RAM but it certainly requires nowhere close to 3GiB. I used to have a 1GiB RAM VM CentOS 8 VM on Azure until recently and it had no problems updating.

2Gkashmiri · on Oct 15, 2022

there are vps that are being sold for peants. i have a 128mb ram NAT vps... bought it out of curiosity and i concur. i am unable to update that thing because of the out of memory thing

indigodaddy · on Oct 15, 2022

Change it to Debian or Alpine? Any VM under 512M I’d never deploy as RH-based.

numpad0 · on Oct 15, 2022

“128MB NAT VPS” thing is OpenVZ or otherwise container based.

$5 per YEAR for a container, 128MB RAM, 1GB persistent storage, SolusVM user panel, and 123.45.67.89:16300 - 16320 NATd to your enp0s3. swapon/swapoff don’t work. I tried installing QEMU, didn’t work either because storage is I/O rate limited. IIRC, OpenVZ has some sort of unified cache control for RAM and storage I/O, and trying to circumvent quota through “disk” don’t work because of it.

I’m not sure what the use case might be, legal or not, but there seems to be dozen or so operators taking our credit cards in the exact same manner. Maybe the operation itself is a money laundering, or maybe it’s to run elaborate offsite contraptions for illegal activities, or to run SEO fake sites? I don’t know.

indigodaddy · on Oct 15, 2022

Sure. I’m familiar with the NAT VPS providers. For the most part they are just providing an ultra cheap service, as there seems to be a market for that. Some people are OK with Ipv6 only or some ipv4 ports and they will just funnel the needful thru Cloudflare and they have an ultra ultra cheap website/service. Even cheaper than the cheapest shared hosting but they still have a box they can control to some extent.

Even with openvz/LXC though, they will generally give you a few choices of distro flavors. So in general if given a memory allowance 256 or under, definitely don’t choose anything Red Hat based or you’ll suffer from yum potentially not working.

oneplane · on Oct 15, 2022

Regarding swap partitions: on virtualised systems (like clouds) this is rather easy since you can just edit EBS/LUN/whatever on the provider side, grow, shrink, add completely new ones without doing any of the classical disk management tasks.

On cloud machines, LVM doesn't make a lot of sense, and neither does lots of partitions. Partitions are from a time where you have one disk and want to setup filesystem boundaries. But in a virtualised setup you can slice block storage any way you want, and present it to the VM as completely separate disks. It's a bit like having LVM "on the outside" of the VM. This way, the VM itself becomes simpler and has to care less about the specifics of the disk.

If you are on physical hardware, I'd argue that if you're not using LVM or ZFS, you're probably doing it wrong. Only really small scale (SD card, eMMC, USB SSD, raspberry PI type of stuff) works better without it. That said, even LVM wouldn't be too bad in such cases. (You'd have to get down to JFFS2 and the likes to really not use LVM)

mcculley · on Oct 15, 2022

There should be a set of VFS hooks so that a filesystem can offer up pages of swap to the kernel without an explicit swap file. I remember tinkering with this around 2000, but I didn’t get very far. I would love to see this explored.

kevincox · on Oct 16, 2022

I'd love to see this. There appear to be some programs to monitor usage and dynamically add more swap but they all appear buggy and unmaintained.

If course if you need more than a bit of swap there is something suspicious about your system but I have often found this extra RAM is useful. For example I often have many open files in my editor, great candidates to swap out the ones that I am not currently editing. Or /tmp in RAM, I don't need it persisted but if the RAM could be better used for other things feel free to swap the unused files out.

unpopularopp · on Oct 15, 2022

I'm more surprised that they are using Ubuntu. Just feels there seems to be so many compromise nowadays compared to for example Debian or Arch

stjohnswarts · on Oct 15, 2022

There is absolutely no reason to use those over Ubuntu and many reasons not to use Arch in your critical server infrastructure, but Debian v. Ubuntu would be essentially half a dozen of these or 6 of those.

skywhopper · on Oct 15, 2022

Ubuntu’s primary benefit over Other distributions is that the strict release cycle allows for efficient planning of upgrade cycles. Pretty much anything specific to the local workloads I wouldn’t use the distro packages for anyway, so the distro doesn’t matter so much.

jmount · on Oct 15, 2022

Just about the only think I use swap for on Linux, is not bombing out during the fork/exec (even for something like a system call) from large processes. Back when I last looked, it appeared they need enough backable virtual address space, even when they don't use it. The filesystem swap seems good enough for that.

kevincox · on Oct 16, 2022

I don't think this is the default configuration but it is an option to avoid overcommit.

phendrenad2 · on Oct 15, 2022

> Swap files appear to work fine, including on a mirrored root filesystem, and I've read that they're basically just as efficient as swap partitions these days

I've been saying this for awhile, and I always get berated by angry linux nerds who didn't do any research. The times they are a changin'?

lamontcg · on Oct 15, 2022

We were using swapfiles on Linux at Amazon in 2001 just fine. No idea if someone perf tested it, but we did have pretty decent perf testing (and we did adjust swappiness based on perf testing).

ilovecaching · on Oct 15, 2022

My coworker at Meta and kernel memory management expert Chris Down already wrote about why you need swap:

https://chrisdown.name/2018/01/02/in-defence-of-swap.html

rubyist5eva · on Oct 15, 2022

Been using swap files for years and most recently started turning on zram and it’s been smooth sailing.

tinus_hn · on Oct 16, 2022

Whatever, just make sure you don’t move to swap files on ZFS because it doesn’t work properly.

hk1337 · on Oct 15, 2022

This seems like it's becoming the norm as we move to SSDs

ForOldHack · on Oct 15, 2022

Any swap issues I had, I just added a scratch monkey. I just tried this out. Amazing. The swap file compresses nicely. two machines, 4GB and a 16Gb.

danans · on Oct 15, 2022

Many of the commenters here seem to think this is about moving away from swap altogether, but according to the article this is switching from a swap partition to a swap file, so still using swap.

password4321 · on Oct 15, 2022

Uh-oh, somebody read the article!

Thanks for the heads-up.

xani_ · on Oct 15, 2022

Eh, just use LVM. I have no idea why it isn't a standard on some distros. Then you can mix, match and resize at will, and as a bonus you can migrate system live from one device to another without rebooting.

> (If you're willing to use swap files anyway you can always add more swap space with a swap file even if you started with a swap partition. However, if your swap partition is too big, shrinking it or limiting how much of it gets used is more annoying.)

You can't shrink swap file without swapoff & swapon anyway so it's illusion of an improvement

Proven · on Oct 15, 2022

Uh, I moved to swap files 10 years ago and more recently to no swap.

I didn't find it necessary to blog about it.

jmbwell · on Oct 15, 2022

It seems reasonable that a university would take a moment to share their methods and findings so that others who might consider doing the same thing can benefit from the experience of others.

hinkley · on Oct 15, 2022

The bigger question is why now? I thought swap partitions were down to almost nothing about ten years ago, because the overhead was too high?

I guess NVMe gave it a second wind?

mixmastamyk · on Oct 15, 2022

Because recent distributions have enabled it by default. I hadn’t used swap for a decade or so since acquiring 16gb ram, but one day noticed a new swapfile in my rootfs.

indigodaddy · on Oct 15, 2022

Which distros can you think of offhand? I wasn’t aware of this.

mixmastamyk · on Oct 16, 2022

Ubuntus. Fedora is now using zswap.

mannyv · on Oct 15, 2022

Removing swap is something most people do in production because performance matters. It's been this way for 20 years. Was that not normal?

bityard · on Oct 15, 2022

Swap actually increases system performance for many common workloads: https://chrisdown.name/2018/01/02/in-defence-of-swap.html

nisa · on Oct 15, 2022

This article is always thrown around if the topic comes up, at least in my experience it's wrong or at least not useful for my usecases (general purpose web hosting stuff in all kinds of flavours)

- swap on spinning rust outright kills the system if memory is low. it's a full blown crash.

- swap on ssd act's much the same

- zram swap stalls and slows down the system for several minutes

I just raise vm.min_free_kbytes to make the oom killer faster and disable swap. At least the machines don't crash hard this way.

jjoonathan · on Oct 15, 2022

Worse than killing the system, it slogs the system, locking it in a state where it doesn't function and doesn't recover.

AshamedCaptain · on Oct 15, 2022

You are either abusing the word "crash" or otherwise make a very dubious assertion here.

nisa · on Oct 15, 2022

crash as in it's not recoverable using ssh - machine only responds to ping and needs a hard reset.

andrewjf · on Oct 15, 2022

Which is exactly the situation you never, ever want a server in. So it’s actually worse than a “crash”

Me and my team removed swap on tens of thousands of machines cause we were sick of dealing with this. We wanted the machines to fail hard and fast and not go into a state where it’s doing only a minuscule amount of actual work while trying to recover.

cosarara · on Oct 15, 2022

And I think that, sadly, it's not even a full solution, because linux can manage to get thrashing even without swap. It pages in and out things like memory mapped files or the content of executables of stopped processes. See for instance: https://serverfault.com/questions/898388/how-to-prevent-kern...

throwaway09223 · on Oct 15, 2022

It really doesn't. When the author article writes about swap helping "Under moderate/high memory contention" -- most people have already powercycled their device at this point (or monitoring has pulled the struggling node from production). It's utterly useless, due to the huge difference in latency between RAM and storage.

On most production systems, memory allocation is roughly 1% system and 99% application (not counting temporary, evictable allocations like disk cache). Modern applications are not designed to have their pages swapped out to disk and it is not particularly helpful to swap the teeny tiny bits of OS.

There are parts of this article which are true, but the overall picture it paints is not accurate and the conclusion that swap is helpful is not correct for high performance systems - both workstations and servers.

morelisp · on Oct 15, 2022

The basic argument is that swap can be preferrable to hard eviction for managing disk cache, which can be true, but disk-heavy workloads are less common than ever especially in general-use clusters.

(The other argument is the usual mix of "but it can swap out unused areas" in which case you should fix the program to not allocate a ton of memory it doesn't need; and "but the OOM killer will come" and yes that's the entire point please move my process elsewhere).

shawnz · on Oct 15, 2022

> "but it can swap out unused areas" in which case you should fix the program to not allocate a ton of memory it doesn't need

Isn't this basically asking every application to independently reimplement their own swap-style functionality? Surely it makes more sense to have swap as a system level feature that every application can take advantage of?

That way the system has more information available to make swapping decisions since it can take into account the memory usage of all applications and drivers/etc together, and applications can take advantage of the most advanced swapping algorithms in the latest versions of the system without having to be individually updated.

morelisp · on Oct 15, 2022

> Isn't this basically asking every application to independently reimplement their own swap-style functionality?

No, it's asking for applications not to allocate a bunch of space they don't need in the first place.

shawnz · on Oct 15, 2022

If you are suggesting that your applications are allocating a bunch of memory which they literally never use in any way, then I don't think that is really a common issue in practice.

In practice I expect the typical scenario you're thinking of is applications are allocating a bunch of memory which they use only briefly or rarely and could easily fetch or recompute that data again later if needed. That's exactly the scenario which swap optimizes for. So why should every application individually implement logic to optimize for it? Fixing these kind of issues is not as simple as "just don't allocate the memory".

morelisp · on Oct 15, 2022

> If you are suggesting that your applications are allocating a bunch of memory which they literally never use in any way, then I don't think that is really a common issue in practice.

That is what I'm suggesting, and I suggest you look at how much space is wasted by startup-initialized data in libraries for features you'll never use, JIT representations that never get compiled for more than one shape, class metadata that never gets touched and vtables which don't get used, how easy it is in any GC language to accidentally keep a large buffer alive until the end of a lexical scope rather than its last real use, etc. etc.

Today's programs are bloated as hell dude.

shawnz · on Oct 15, 2022

All these problems you're describing are way more complicated than "just don't allocate the memory" and would involve sophisticated logic to optimize (like predicting exactly what features are or are not going to be used by an application at start time without significantly increasing the startup latency, etc). Why not just let swap be that optimization?

lazide · on Oct 15, 2022

Because it’s dangerous. Are you SURE that memory never gets accessed under load? Because I’ve seen a lot of surprised sysadmins over the years.

shawnz · on Oct 15, 2022

So why not address those edge cases with something like "madvise" style configuration rather than just saying application developers need to reinvent the wheel any time they do something outside the golden path?

lazide · on Oct 15, 2022

Because that would require the programmer do something explicit to deal with a runtime edgecase they fundamentally don’t have the context to decide the right course of action on?

Especially if it’s a library or the like?

That infrequent call might be key for program stability in one context or pointless (garbage collector whatever- key in a major server program, pointless in a toy app).

The moral of the story is, if under load code or data gets moved from where access is fast (RAM) to where it is slow ( even SSD and most NVMe counts here) then gets accessed when under load? It makes the problem much worse.

shawnz · on Oct 15, 2022

Surely if they don't have that context, then it's even less likely they'll be able to roll their own optimization to fix the issue without swap.

lazide · on Oct 15, 2022

Well, they could use less RAM in general (hence less likely to be applying memory pressure). But good luck.

Hence why the advice for servers is turn swap off. It’s a footgun waiting to happen.

menaerus · on Oct 17, 2022

I don't understand the argument. Performance-wise, swap will obviously decrease the one but runtime-wise isn't it going to actually enable your application to continue running rather than crashing, albeit at much lower speeds?

Once you exhaust the RAM and you run with no swap configured, what happens with your application?

morelisp · on Oct 15, 2022

Mediocrity uber alles forever!

shawnz · on Oct 15, 2022

I'd argue that the fact such a simple and elegant solution as virtual memory is able to effectively cover such a wide range of use cases while eliminating the need for every application developer to duplicate their efforts is the exact opposite of "mediocre" and in fact it's the exact kind of efficiency-minded thinking that the software industry needs more of.

lazide · on Oct 15, 2022

Actually no - this is the situation that swap kills your system.

That rarely used code path will now take 50 seconds because it got called when it was swapped out (and the system is under memory pressure), instead of either OOM’ing awhile ago or completing in a couple microseconds like it usually does.

And ‘infrequently called’ here could be every couple minutes.

Dylan16807 · on Oct 15, 2022

If the system is under such strong memory pressure despite swap then you already screwed up. And trying to run that much without swap would be even worse.

lazide · on Oct 16, 2022

It almost always happens at some point. Someone fat fingers a memory allocation. Memory leak from a long running process finally grows too big, someone installs an update and it doubles memory usage on something, etc.

I had it happen recently that the backup/sync software for a NAS had a constant factor memory consumption based on the number of files it was syncing. Transferred in a bunch of data, and blam. Commercial NAS, and they used swap (shitty, never buy QNAP). Wedged so hard it took a hard power cycle to even get console, AND caused data corruption in the ZFS pools.

It’s what happens next that decides the stability of the system.

If it can grow into swap and keep going, things start crawling, load builds up, buffers expand, and the system eventually grinds to a halt. On Linux, often in a really wedged and irritating/impossible to fix way.

Or, if no swap, OOM killer shoots something in the head (hopefully the offender), and we’re back to normal (minus the thing it killed, which is usually the thing you didn’t want it to - but at least the system works and you know something is wrong).

In either case, caches have dropped to near zero awhile ago so performance is already getting bad. It’s if we enter a death spiral, or get death early enough the whole system doesn’t spiral.

Dylan16807 · on Oct 16, 2022

> In either case, caches have dropped to near zero awhile ago so performance is already getting bad.

Right. You can get pretty deep into a death spiral even without swap. I feel like the better-performing solution is to make OOM trigger earlier, and to then go ahead and have swap be on.

Somewhat tangentially, I'd really like a setting for minimum disk cache, and that would do so much to help prevent thrashing.

morelisp · on Oct 15, 2022

> And trying to run that much without swap would be even worse.

No, swap makes it worse. In virtually all cases I’d rather die than stall or flap.

Dylan16807 · on Oct 15, 2022

Having no swap won't prevent stalling out. Systems can do that just fine by evicting necessary executable code over and over.

lazide · on Oct 16, 2022

Only in a very, very tiny window. I’ve never seen a system sustain it for more than a trivial amount of time before it OOM kills something and then voila, enough memory.

I’m sure someone here will chime in with their example though.

Dylan16807 · on Oct 16, 2022

I mean, it happened to me twice in the last couple months on a laptop where I did a lazy install and didn't set up swap. Firefox and 2-3 smaller programs ate up so much memory that it stopped responding for multiple minutes and then went completely dead.

I rarely need or set up much swap on a server, but I've managed to have the same kind of problem with the OOM killer not kicking in very quickly on a server. And on my desktops swap is able to soak up many gigabytes and improve my performance a lot with thrashing basically never happening.

hedora · on Oct 15, 2022

Yes, but it helps for essentially zero large server workloads.

(Which is why most big server deployments disable it.)

marginalia_nu · on Oct 15, 2022

Linux paging behavior without swap is a complete train wreck.

Doesn't have to be big, but without it, you can bring the system to its knees with 30% ostensibly free ram.

shawnz · on Oct 15, 2022

Swap is a performance optimization, so why would you remove it if you care about performance? Contrary to popular belief, swap isn't a way to get extra memory for free: it won't help you if your workload size is bigger than your physical memory capacity and that is not what it is meant to be used for.

kevin_thibedeau · on Oct 15, 2022

It was an optimization when RAM was too expensive to keep a full working set live.

shawnz · on Oct 15, 2022

No, that's exactly what I'm saying: swap was never meant to let you run workloads bigger than your physical memory capacity.

_lqaf · on Oct 15, 2022

It is normal, at least in my world.

I don't find the term "performance" useful outside of a specific context. The main reason we don't use swap is predictability.

We know how much memory a particular VM should be using. If it exceeds that, failing quickly rather than changing behavior (slowing down) is far preferable, and then you correct whatever the problem is.

All the testing we've done with swap has shown it to have negative in server environments. I see it as useful for client machines, and maybe a bandaid to get by with underpowered systems if you have to for some reason. But that shouldn't happen in prod, especially if you're a public cloud user, as most folks here seem to be.

kevincox · on Oct 16, 2022

I think that is where the confusion comes in. Almost everyone agrees that if your workload is slowing down due to paging that is bad.

If your working set is bigger than your RAM you have a problem, swap or not.

However swap helps optimize your RAM usage so that your working set can fit with less waste. It allows the kernel more flexibility with what pages to evict from RAM. Without swap it can only evict pages backed by files. If you start evicting files that are in your working set you are just as screwed as if it starts evicting anonymous pages in your working set. With swap it can evict other unused pages before touching the pages in your working set.

I think there is some truth that it can be harder to notice with swap because there is more buffer between running great and literally crashing. However, in either situation you should be monitoring IO wait and application performance to ensure that you have enough memory for your working set.