I have no idea where this limit came from. I worked at WhatsApp[1], and while we...

ksec · on Sept 7, 2020

>I think our big cluster had around 2000 nodes when I was working on it.

Is there fairly recent? I thought WhatsApp was on FreeBSD with Powerful Node instead of Lots of Little Node?

>BEAM scales pretty well with lots of cores and lots of ram, so it's nicer to run 10 twenty core nodes instead of 100 dual core nodes.

Something the I was thinking of when reading POWER10 [1], what system and languages to use with a maximum of 15 Core x 16 Socket x SMT 8 in a single machine. That is 1920 Threads!

[1] https://www.anandtech.com/show/15985/hot-chips-2020-live-blo...

toast0 · on Sept 7, 2020

Lots of powerful nodes. That cluster was all dual xeon 2690v4. My in-depth knowledge of the clusters ends when they moved from FreeBSD at SoftLayer to Linux at Facebook. I didn't care for the environment and it made a nice boundary for me --- once I ran out of FreeBSD systems, I was free to go, and I didn't have to train people to do my job.

We did some trials of quad socket x86, but didn't see good results. I didn't run the tests, but my guess from future reading is we were probably running into NUMA issues, but didn't know how to measure or address them. I have also seen that often two dual socket machines are way less expensive than a quad socket with the same total number of cores and equivalent speeds; with Epyc's core counts, single socket looks pretty good too. Keeping node count down is good, but it's a balance between operation costs and capital costs, and lead time for replacements.

The BEAM ecosystem is fairly small too, so you might be the only one running a 16 socket POWER 10 beast, and you'll need to debug it. It might be a lot simpler to run 16 single socket nodes. Distribution scales well for most problems too.