This is one of the biggest problems I run into in 'software engineering', people building way to complex systems because they are 'interesting' rather than buying $50 of RAM and being done with it, and then poo-poo'ing systems that cost $50 and work.
Querying a DNC list is not a problem in which you will ever not be able to buy more RAM, it's trivially parallel, if for some reason DNC lists ever outpace Moore's law, just buy another system.
To be fair to the authors at least they didn't do something ridiculous like build a 100 note cassandra cluster.
This is one of the biggest problems I run into in 'software engineering', people building way to complex systems because they are 'interesting' rather than buying $50 of RAM and being done with it, and then poo-poo'ing systems that cost $50 and work.
Funny; one of the most common complaints about the software industry (common on HN) is that people use inefficient languages or algorithms and then waste too much hardware.
And realistically, there's nothing you can do with phone numbers that even needs the full speed of RAM. A fast SSD can do enough random reads to load the entire database in under 20 minutes, and nobody actually needs to know the status of all 400M numbers in the same 20 minutes because they can't all be dialed that quickly.
A single fatcache can do close to 100K set/sec for 100 bytes item sizes.
A single fatcache can do close to 4.5K get/sec for 100 byte item sizes.
All the 8 fatcache instances in aggregate do 32K get/sec to a single 600 GB SSD.
Yeah, looking at the graph in the article is really weird. Wait, you can query 10k phone numbers per second with a single crappy linode machine, why does it need to be faster than that?
Querying a DNC list is not a problem in which you will ever not be able to buy more RAM, it's trivially parallel, if for some reason DNC lists ever outpace Moore's law, just buy another system.
To be fair to the authors at least they didn't do something ridiculous like build a 100 note cassandra cluster.