Switching to AWS Graviton slashed our infrastructure bill

cedws · on Nov 30, 2022

Cloud cost optimisation is underrated. In the companies I've worked in nobody has really given a shit (at least not under normal economic circumstances). In the industry there's a strong avoidance of ARM compute instances for no good reason. If I were building from scratch today I would definitely go with Graviton.

jiggawatts · on Nov 30, 2022

At $dayjob I found an unused box in the cloud running an expensive database engine. It was idle for months, created to be used by a consultant on a project that had wound up. The consultant had quit his consultancy on top of this.

I was told under no uncertain terms not to even think of touching this VM because “the budget has been approved”.

I was shocked at the flagrant waste of money and assumed it was a one-off aberration.

Nope, for months afterwards I kept hearing the same refrain from manager after manager, from product owners and dev team leads.

“Don’t touch! We fought hard for this budget! You’ll take it from our cold dead hands!”

Eventually I soured on the whole idea of cloud cost optimisation a service for unmotivated third parties and gave up on the whole notion.

yazaddaruvala · on Nov 30, 2022

FWIW, in these situations you're better off proposing:

"I'm going to reuse this VM, to help our ... fleet scale better."

That way your management continues to use their allocated budget, and your real prod systems work slightly better (also will eventually require less additional $ to scale up - helping the company i.e. shareholders).

The thing to remember:

You would assume all middle management really manages are a top line and a bottom line. Numbers related to their KPIs/OKRs are roughly a top line, and numbers related to their resources (humans and cloud infra budget) are roughly their bottom line.

The reality: Middle management's resources (humans and cloud infra budget) are not their bottom line. Middle management gets rewarded (promoted) when they have "enough scope", scope has roughly always been defined by number of people (it now also includes things like cloud cost budget). As such middle management has to say "we need to do more with less", but they are promoted based on these numbers going up!

Is this reward structure in the best interest of companies (i.e. customers and/or shareholders)? No, neither. Is there a better system? Not yet. Is the reward structure created by middle management for middle management? Likely.

So in the meanwhile, if you don't want to become unmotivated, might as well work within the current reward structures.

jiggawatts · on Nov 30, 2022

> help our ... fleet scale better."

Heh, I tried that too! I found a lot of setups using old HDD disks that were at 100% of their IOPS limits and CPUs that were idle. When they were built, Azure didn't have Premium SSD, so that's forgivable. I offered to rearrange where the expenditure goes, such that they have newer and faster CPUs, Premium SSD, but fewer cores which means a reduction in licensing costs. Cost neutral while improving their performance and capacity.

That got a very firm "Nope! Nope! Nope!" from most (but not all) teams as well.

> As such middle management has to say "we need to do more with less", but they are promoted based on these numbers going up!

I came to the same conclusion. Many project managers or product owners introduced themselves in the first meeting by proudly proclaiming the huge size of their operation. I.e.: "The system I'm responsible for is a multi-million dollar project with a small army of developers!"

I've also noticed that there's a tendency to exponentially blow out complexity of what ought to be a trivial system for the same reason. Something that could be static HTML on an S3 bucket or Azure Storage Account turns into a microservices monstrosity draped across three clouds and four external SaaS services.

Those resumes don't pad themselves.

joshspankit · on Dec 1, 2022

It’s as though they interpreted David Graeber’s Bullshit Jobs as a managerial playbook

GuB-42 · on Nov 30, 2022

Clever managers will repurpose that box.

One of my managers was a champion at that, taking scraps from everywhere for undercover projects. One of the few who managed to get things done in a highly bureaucratic company. Also helped by the fact he is good at picking competent people.

raffraffraff · on Dec 1, 2022

Imagine seeing your startup grow into a company where bureaucracy rewards department heads who waste money now to protect their budget so they can keep wasting it next year...

pclmulqdq · on Nov 30, 2022

I think the main reason is "I want to run the same binaries locally that I run in the cloud," and it's a pretty valid one. However, it's also an expensive one sometimes.

jacobwg · on Nov 30, 2022

Anecdotally, this is starting to shift with M1 MacBooks, Graviton is looking more attractive for precisely that architecture parity reason for teams using majority M1 devices.

gryf · on Nov 30, 2022

Yeah if only. Our ops people are too uneducated to be able to deploy anything Apple. Literally there are armies of factory pressed Windows monkeys but nothing in the Apple space.

Note to apple: please start concentrating on the enterprise sector. We're dying over here. My Dell weighs 3x my personal M1 MBP, has a shitty keyboard with keys designed for Borrowers, the battery lasts 8 minutes and it reduces my sperm count if I put it on my lap. It feels like I have a ball and chain around my ankle 24/7. My only escape is WSL2 which is broken as fuck as well (can't run services, cron jobs, X problems etc) and we can't install a simple non-WSL VM on the node because Device Guard requires hyper-v to be enabled excluding sensible and pure VM options like VirtualBox. Docker for windows is a comedy of errors too.

sbarre · on Nov 30, 2022

For what it's worth, I work in a large 30,000 employee company. Everyone gets a Windows machine by default.

6 years ago our department of 200 people "went rogue" and started provisioning Macs, because it was the only way we could hire developers and provide a good developer experience for the work we were doing.

This took some convincing, but it was possible. We agreed to be unsupported by the in-house help desk, but we had 2 people in IT that supported us for provisioning and fixing machines, and sorting out a small amount of required enterprise software like a Cisco VPN client and some fleet management background agents.

Otherwise, we self-organized support over Slack and in-office and also made use of Apple's business support directly.

As of earlier this year, our department is now over 600 people, and we've given our internal IT enough incentive to officially support Macs, which they now do, alongside Windows.

They use some kind of MDM software to manage and update and monitor our Macs the same as they do Windows.

There are also now additional much larger teams in the organization exploring Mac adoption where it makes sense for their developers too, and we could soon have thousands of Macs in use.

So it's definitely possible, even if you have to start small.

gryf · on Nov 30, 2022

Unfortunately we're in a regulated industry so getting a PO signed off for one Mac is nigh on impossible without involving the entire corporate machinery.

sbarre · on Dec 1, 2022

We are also a highly regulated industry. Not finance but pretty close.

I sympathize with you though, it sounds like there's no will to do it, which sucks :(

lostmsu · on Nov 30, 2022

Are your developers forced to use laptops?

sbarre · on Dec 1, 2022

Our company issues laptops yes, because everyone can choose to either work at home or in the office.

That said, we also supply external monitors, keyboards and mice for anyone who wants one, and every worker has a discretionary budget to spend on home office accommodations (chair, standing desk, etc).

I also suspect if a developer specifically asked for a desktop computer and could (lightly) justify the need for it, we would get them a desktop, and a laptop.

gryf · on Nov 30, 2022

Unfortunately yes. I would rather a desktop but they don't know how to pay half as much for the same specification. The desks in the office are all equipped with docks and expensive WiFi mesh driven by COVID mentality so that is the status quo.

Just send me a fucking workstation. Nope too hard.

scarface74 · on Nov 30, 2022

Who wants to be in an office in 2022 unless necessary (specialized hardware, etc.)

gryf · on Nov 30, 2022

Exactly that. I don't and I won't do it again.

scarface74 · on Nov 30, 2022

Do you propose they carry desktops back and forth from their office and to conference rooms?

lostmsu · on Nov 30, 2022

Every conference room could have a device to facilitate presentation.

scarface74 · on Dec 1, 2022

What device is going to have my PowerPoint slides, my IDE, my logged in AWS account, etc and everything else I need for demos?

Where I work, every conference room in every corporate office has a TV and a dongle that you can plug in via either HDMI or USB C.

lostmsu · on Dec 1, 2022

Any thin client would do.

scarface74 · on Dec 1, 2022

I’m going to have a “thin client” with my exact setup and my IDE, PowerPoint slides etc? Are you proposing that I work from a thin client? What happens when I’m working from home?

lostmsu · on Dec 1, 2022

It's only for meeting rooms.

scarface74 · on Dec 1, 2022

So in a meeting, is this person who is presenting going to copy everything they need from their desktop? If it’s a group meeting will no one else have their computers with them?

happymellon · on Dec 1, 2022

1. I don't think you understand what a thin client is.

2. I have found that face-to-face meetings where other people bring their laptops usually means that they get distracted and don't participate.

scarface74 · on Dec 1, 2022

You’re right, I am new to this whole cloud stuff, it’s not like I actually work for the largest cloud provider in the world in the consulting department where it’s my job to know this (hint: I do).

It’s also not like I wrote my first line of code in 1986 (Hint guess what the 74 in my name signifies).

Now that hopefully we can dispense with the idea that I don’t know how this computer stuff works. Please answer the question. How do I get everything I need from my computer to this thin client? How do I keep my environment in sync with this thin client?

sbarre · on Dec 1, 2022

My friend, you are being trolled.. Your responders keep moving the goal posts on you.

You should just walk away.

scarface74 · on Nov 30, 2022

> Note to apple: please start concentrating on the enterprise sector. We're dying over here.

Enterprise sales are where the customer is not the user. Apple does best when the user is the purchaser.

Also, I know for a fact that Macs are well supported at scale by many large tech companies including my own.

vetinari · on Nov 30, 2022

FYI, with WSL2 1.0, you can finally enable systemd, so you can run services and cron jobs.

gryf · on Nov 30, 2022

Thanks for the tip off. I will look into this tomorrow. This is why I'm here. The distributed consciousness of HN is a wonderful problem solving engine :)

dserodio · on Nov 30, 2022

Never user Docker for Windows, but Docker for Mac has not been great lately either.

Granted, macOS changes a lot between each release but our company is paying for Docker Desktop licenses and the experience has really been disappointing.

throwaway821909 · on Nov 30, 2022

Can you use a plain Hyper-V VM? i.e. with Hyper-V Manager

gryf · on Nov 30, 2022

Tried that but unfortunately there are some painful addressing and routing issues when you are subjected to when dealign with a corporate always on VPN. Ergo you can't actually contact clusters which you have to admin via kubectl.

xenophonf · on Dec 1, 2022

What's stopping you from running kubectl.exe or Cygwin? Frankly, I still think Cygwin's better than WSL in many ways.

aidos · on Nov 30, 2022

That’s the transition we went through. Our dependencies are / were pretty weird so the transition took a bit of effort - I suspect more complicated than many people would have to go through.

We all use Macs at work so we knew it was a matter of time before we were on ARM. I’m glad we made the transition. M1 airs are a delight to work with and Graviton machines are great bang for buck.

wizofaus · on Dec 1, 2022

Valid why? Do you not trust compilers? Is it infeasible to (at least occasionally) run automated tests on cloud instances? Personally I've been pretty used to quite significant differences between local and production environments - it's rarely an issue, and I don't remember CPU architecture every being one. Things like timezones or firewall restrictions/ network differences (including talking to 3rd party APIs with IP whitelisting) are far more likely to cause problems.

arecurrence · on Nov 30, 2022

Completely agree... the only exception I've run into is that for small operations build tooling often doesn't work well with arm64.

EG: GitHub actions can build a container in a few minutes in x64 or 35 minutes in arm64... likewise aws-cdk literally could not run an arm64 fargate ecs deployment for months after support was added (They simply did not support the required attribute in the container definition).

I would love to see this change as I've had nothing but great experiences with graviton for virtually anything arm supported.

_joel · on Nov 30, 2022

Are you building on arm64 natively or via qemu. A few mins vs 35 for the same roughish spec of CPU, seems a bit off, even with optimisation considerations.

I've found arm64 builds on amd64 take longer when using one build context/arch (but doing multiple platforms), but that's as it's being emulated.

It's the oppostite on my M1, the buildx amd64 takes longer.

arecurrence · on Dec 1, 2022

oh it was with buildx (which uses qemu) as GitHub Actions run in x64. I was showing a specific example of arm64 build tooling challenges small startups encounter (Github Actions lack of arm job runners in this case). My arm64 builds on arm64 architecture scream.

_joel · on Dec 1, 2022

You can do a self hosted runner on arm though, even on a Raspberry Pi (if inclined), so not sure it's prohibitively expensive to a startup, could even afford a spot graviton :D

I've got to say though, I've not experience that delta in build times, even emulated on my machine.

sofixa · on Nov 30, 2022

> GitHub actions can build a container in a few minutes in x64 or 35 minutes in arm64

What type of container, and on what runner? That has not been my experience at all, a cross-compiling buildx build with Python and a bunch of libraries takes only slightly longer for arm64 than it did for x86.

waych · on Nov 30, 2022

My favorite way to watch this slow down is to introduce some node workloads into the build workflow.

forty · on Nov 30, 2022

First graviton is not magic. We switched our main service, which is a nodejs monolith, and did not get any cost improvement (we had to add more instances to handle the same workload, which ended up being equivalent cost wise). There are certainly use cases when it's better, but it doesn't seem to be the only and obvious choice for all use cases.

Second our laptops and our CI are amd64 machines, and being able to run the same docker images in prod and locally is nice, and not having to build the image with qemu on the CI is also good.

I don't mind cloud-ARM, but there definitely are good reasons not to use it (which of course don't apply to everyone)

andrewstuart · on Nov 30, 2022

If I started to build today I'd build and host my own servers, or go with servers from ionos. Cloud is very expensive.

gryf · on Nov 30, 2022

I just worked on a massive "optimised" cloud migration like you've never seen. We moved from multiple DCs to AWS and the costs are approximately 8x what the pre-migration costs are. We were realistically expecting 2x which gave us some regional agility and was expected but the unconstrained growth and misunderstanding of the cost model was terrible. It's designed to be so convoluted that you can't possibly estimate costs until you get the first bill at which point you are committed on a multi-month or year project. On top of that the assumption at the time of development is the cost is someone else's so the sprawl since the migration is dangerous which means we cannot leave ever now we've embraced the PaaS options.

The whole proposition relies on the idea of a sunk cost being accepted.

So yes, back to servers please. IaaS should be the maximal offering that is accepted by a business from a risk perspective unless the tool or technology is disposable in a 6 month window. There is space there for gains. PaaS hell no.

Edit: worth mentioning that AWS support is somewhere near dire. We've had issues with multiple services and despite being a VERY high roller with enterprise support we can't get anything fixed in any reasonable time. It's just someone else's crap you're using and they aren't any better at it than you are, just adding lead time to any issues. In some cases I've had to actually call out complete bad implementations that break function guarantees provided by open source projects (I can't logically warn people away from services as it's pretty obvious who I am if I do). One rule I've developed is that if it's not a core project: S3, EC2, EBS, ALB etc then it's probably a commercial liability in some way. There are no people working or with any knowledge on some major bits of AWS infra.

tester756 · on Nov 30, 2022

>We moved from multiple DCs to AWS and the costs are approximately 8x what the pre-migration costs are.

8x?? that's crazy, what where you doing wrong then?

gryf · on Nov 30, 2022

Everything, all at once.

SMEs can't reliably manage that transition with any skillset and still deliver a product at the same time.

jiggawatts · on Nov 30, 2022

I’m very curious to hear more details!

Did you use reservations to reduce costs?

Was it a lift and shift with VM configs staying as-is? (I’ve seen a lot of empty 1TB “app” drives burning money in the cloud!)

You complain about PaaS services, but I can’t imagine 8 data centres worth of stuff being converted to PaaS in hurry!

gryf · on Nov 30, 2022

Cost savings are mostly consolidation, scaling down stuff we don't need (we have peak hours) and migrating stuff to kubernetes and packing it tight.

jiggawatts · on Nov 30, 2022

Despite that you were paying 8x the previous amount!?

How is that possible?

E.g.: With Kubernetes and AWS you ought to be able to use clusters with a base of "reserved" capacity plus spot pricing for peak hours on top, right?

From what I've seen (in my limited experience), that should reduce costs for most orgs, not increase them!

gryf · on Nov 30, 2022

I have identified the limits of human incompetence.

Please someone hire me so I don't have to live through this nightmare any longer.

yjftsjthsd-h · on Dec 1, 2022

> How is that possible?

TLDR: AWS is really expensive compared the equivalent compute elsewhere, to the point of overwhelming its advantages.

---

Long answer that I wrote before realizing how long it was:

I'm not the person you're replying to, I don't know the exact details, I don't know their company. However:) I can tell you that AWS is a factor more expensive than renting or owning your own metal. They (claim to) offset this by 1. taking care of management for you so you need fewer ops people, and 2. letting you scale up and down as needed. The first is plausibly legitimate, but really depends on the size of your compute and the size of your human teams (and how much benefit you get from AWS-managed offerings). The second can help, but only if you've got really spiky workloads, that are only really running a tiny fraction of the time, with absolutely tiny base load. Scaling up and down helps reduce your AWS bill compared to not scaling up and down on AWS, of course, but it doesn't help that much when their elastic offerings are that much more expensive. Say, for the sake of argument that you can run most of your EC2 instances just 6 hours a day, during peak hours. That lets you win... if, and only if, EC2 instances are less than 4x the cost of just owning your own bare metal machines. I've gone so far as trying to price out using spot instances for CI work - just selectively, when they were the cheapest! - to augment instances on Hetzner Cloud. Guess what? They're so expensive that even at spot prices EC2 is a factor more expensive.

jiggawatts · on Dec 1, 2022

Really?

I just priced up a 128-core (dual EYPC) Dell server with specs comparable to a matching Azure cloud server (HB120rs_v3), and the "1 year reserved" price for Azure came out to about the same as the purchase price over 3 years. The 3-year reserved price is about the same as the purchase price over 4 years. There's a new 5-year reservation option, which is the equivalent of owning the hardware for 7 years. Spot pricing is the equivalent of amortizing the purchase price over 12 years!!

Meanwhile, using the cloud, you can upgrade to the 9004 series in just a few months, not 12 years from now. And then whatever comes after the 9004 series in like 2 years, not 5, 7, or 12.

So it looks like that at the larger scales, the pricing is very directly competitive with on-premises hardware.

Consider that the cloud hosts include most basic operations costs, such as cooling, electricity, data centre floor space rental, the SFP ports and cabling, etc, etc...

Speaking of which, a quick back-of-the-envelope calculation is that a 128 core server will cost about $20K in cooling and electricity over its lifetime.

Note that the Azure HB120rs_v3 sizes come with 200 Gbps InfiniBand "just thrown in" for laughs. Try pricing that up some time, in case you want to build your own hyperconverged infrastructure!

Admittedly, at the smaller sizes, Azure and AWS are less competitive, but you do get flexibility, automation, and a bunch of other stuff that's difficult and expensive at scale on-prem.

tester756 · on Nov 30, 2022

it feels like the "refactor" was the way to improve

gryf · on Nov 30, 2022

I love reading people saying this. Try a couple of thousand DB tables evolved organically. Nope not happening.

tester756 · on Nov 30, 2022

I meant that what you did was "refactor" - rethinking the whole mess

So it wasn't purely cloud/renting racks issue

Spivak · on Nov 30, 2022

Rent servers: yes. Host your own: maybe. You can run a whole-ass company on 2 $70/mo servers from Hetzner (and some B2 for durable storage) while you figure out whether you have a market or not.

Like there's just no point in coloing when you're small because either all non-server bits will cost you for no reason or you're using something managed which is just cloud but more annoying.

adrr · on Nov 30, 2022

Switching to Gravitron isn't an automatic cost savings. Everything is optimized for x86. It maybe cheaper, but significantly slower. We've been trying to migrate for the last year for both cost saving but also we switched to ARM based laptops.

gryf · on Nov 30, 2022

This. We have three people entirely dedicated to reducing costs.

As for avoiding ARM, we do only x86-64 because corporate security policy demands that we have Windows laptops so that some box ticking overlord can fill out a security policy compliance form. That means we're stuck limping along with docker and WSL2. Every single engineer in the org has an arm64 machine at home already and wants a proper computer at work, which can ironically work in the same policy framework if anyone gave enough of a shit to deal with it.

So that's why we don't use Graviton; corporate security policies. Our customers will just have to eat the price hikes.

allset_ · on Dec 1, 2022

Builds in production shouldn't be built using developer laptops. I think you're approaching this wrong. You can build and test on x86_64 laptops all day if you want and still easily deploy to arm64 servers.

gryf · on Dec 1, 2022

We don't do production builds on laptops.

But it's important that all builds are 100% reproducible on all build targets and that includes non docker ones. That is much more difficult if you have to cross compile stuff. We can barely manage one architecture.

jiggawatts · on Nov 30, 2022

You can cross-compile to ARM using Docker on Windows.

See: https://www.docker.com/blog/multi-arch-images/

nodesocket · on Nov 30, 2022

Agree, I've gone to Graviton instances by default for RDS and ElastiCache (run and own a DevOps consulting company). The big problem that I continue to deal with is native arm64 Docker containers (if you a cool kid running containers / Kubernetes). For example, the very popular Bitnami charts don't support arm builds even though the community has been screaming for support.

hdjjhhvvhga · on Nov 30, 2022

If I started to build today, I'd definitely go for Hetzner Cloud. There is zero possibility that I get surprised by a large bill.

Reitet00 · on Nov 30, 2022

Too bad Hetzner doesn't have cheaper Arm servers available. Their Ampere pricing is not really convincing.

yjftsjthsd-h · on Dec 1, 2022

Who cares? Their x86 servers are cheap enough to not care about ARM.

jack_pp · on Nov 30, 2022

I feel it's currently in beta, I've tried it and apparently I can't create more than a few instances because my account is "too new", without a clear way to remove that limit so you're right, can't have a large bill if you can't even create 10 instances.

capableweb · on Nov 30, 2022

Did you try writing them and asking them to increase the limit?

No cloud provider will give you the option to create as many instances as there are available ones, they all have limits from the get-go. Usually you have to write them/fill out some form if you want to go above the standard limit, Hetzner Cloud as well.

jack_pp · on Nov 30, 2022

I haven't, mainly because of this warning on their Limits page:

> Your account is too new to request a limit increase. Please note that we generally do not answer questions regarding limit increase on the telephone.

hdjjhhvvhga · on Dec 1, 2022

It's very easy - you just write to them and ask to increase the limit to whatever you need. You need to do so in writing, not on the phone.

tester756 · on Nov 30, 2022

Let's go even further - "Cloud cost is underrated"

yjftsjthsd-h · on Dec 1, 2022

> In the industry there's a strong avoidance of ARM compute instances for no good reason.

Not no reason; it adds work and risks incompatibility. Now, that work might be relatively small, and most software these days is compatible with aarch64, but compared to amd64 (which is the de-facto standard, already supported by everything, the default without needing to set anything up) it's still something, and businesses are risk-averse.

rjh29 · on Nov 30, 2022

We are building everything for arm and I've surprised if other large companies aren't optimising for it.

mk89 · on Nov 30, 2022

I agree. Until it becomes an issue, where everyone runs screaming like chickens, literally nobody gives a shit.

Thaxll · on Nov 30, 2022

Because ARM perf was far far from being on part with Intel / AMD, also you need to be able to compile on that arch.

beaviskhan · on Nov 30, 2022

The biggest downside I've found with Graviton is that it's gotten popular enough that availability of capacity is a problem in some regions/AZs - particularly if you're using larger EC2 instance types.

Also, Fargate Spot on Graviton is still not available, so if you want to run Spot in non-production environments, you're facing with running different architectures in prod vs. non-prod, which I don't like at all. Do the math on whether it's cheaper for your use case to go x86 spot/non-spot vs. Graviton non-spot.

andrewstuart · on Nov 30, 2022

I found graviton to be a mixed bag. It was certainly extremely fast when using the very high end instances and I tested it successfully using a Rust based message queue system I was writing and it got some ridiculously fast number like 8 million messages a second, from memory, using the fastest possible graviton instance (this was about 18 months ago).

I did try to switch some of my database servers to it a couple of years ago and after random hangs, I gave up and went back to intel. I tried again further down the track and same thing - random hangs. I assume this sort of thing comes with a new architecture but I'd be hesitant to move any production infrastructure to it without extensive long term testing.

In the case of graviton based GPU instances I found that the GPU enabled software I wanted to use didn't work.

If you are comparing performance, I'd suggest buying a fast AMD machine and run it locally and compare performance - local servers tend to be much faster and cheaper than cloud. And if your application uses GPUs then if you possibly can then its very much in your interests to run local servers.

axiak · on Nov 30, 2022

Arm has a much looser memory model than x86 [1 for a comparison]. It's possible that the random hangs are due to a race condition in PG that doesn't show up in x86 because memory visibility doesn't require as much synchronization.

1: https://www.nickwilcox.com/blog/arm_vs_x86_memory_model/

glogla · on Nov 30, 2022

There are huge differences in the machine generations. We found that for our workload Graviton3 (c7g) is the best, followed by AMD (m6a), followed by Intel (m6i) with Graviton2 (m6g) somewhat lagging. We can't use Graviton3 however because of memory limitations, so we're using AMD. The difference to the old machine types (m5) is staggering, the m6a is basically twice the performance of m5, while being cheaper.

However, I've seen a lot of benchmarks telling a different story, so it is important to actually measure your workloads.

no_wizard · on Nov 30, 2022

I'd argue just find a different cloud provider.

GCP, Azure, Supabase, Cloudflare etc if you want managed services.

If you want a mix of managed services and raw compute, look more at Fly.io, Linode, Digital Ocean perhaps?

I have found AWS being the "cheapest" or even "reasonable" in the cost department to be slimmer every year.

jiggawatts · on Nov 30, 2022

Steer clear of Digital Ocean.

They've had senior staff on HN justifying security lapses that commenters were describing as a "clownshoes operation".

MuffinFlavored · on Nov 30, 2022

Cloudflare doesn’t let you host Docker containers or offer managed Postgres do they?

no_wizard · on Nov 30, 2022

Its all about how they may fit in your stack. Most definitely fly.io does. I think Cloudflare as far as I'm aware is they're looking at supporting Docker.

I just listed managed services (not all of them may fit I imagined)

whalesalad · on Nov 30, 2022

I’ve been enjoying them here and there but I’ve also found that for some of my workloads a high clock Intel node is required. Even the Epyc nodes couldn’t keep up. I don’t completely know why, never dug too far into it.

Dowwie · on Nov 30, 2022

I'm curious about that Rust-based message queue system

andrewstuart · on Nov 30, 2022

What do you want to know? It was a prototype. I was trying to learn Rust (didn't succeed), but I did manage to hack together a message queue that used HTTP for client interaction.

I'd previously written a SQL database message queue in Python which worked with Postgres/MySQL and SQL server. This worked well but it was not fast enough for my liking. My goal was to build the fastest and simplest message queue server that exists, with zero configuration (I hate configuration).

I used Rust with Actix and I tried two strategies - one strategy was to use the plain old file system as a storage back end, with each message in a single file. This was so fast that it easily maxed out the capability of the disk way before the CPU capabilities were maxed out. The advantage of using plain old file system as a storage back end is it requires no configuration at all. So I moved on to a RAM only strategy in which the message queue was entirely ephemeral, leaving the responsibility for message persistence/storage/reliability to the client. This was the configuration that got about 8 million messages a second.

As far as I could tell my prototype left almost all message queue servers in the dust. This is because message queue servers seem to almost all integrate "reliable" message storage - that makes the entire solution much, much more complex and slow. My thinking was to separate the concerns of storage/reliability/delivery and focus my message queue only on message delivery, and push status information back to the client, which could then decide what to do about storage and retries.

I gave up because I didn't see the point in the end because it wasn't going to make me any money, and I was finding Rust frustratingly hard to learn and I had other things to do.

moloch-hai · on Nov 30, 2022

It seems very diplomatic of you to say you found Rust hard to learn, rather than that it was hard to make Rust do what you wanted. You seem very clear on what you wanted to do.

andrewstuart · on Nov 30, 2022

I managed to build it without really grasping Rust by hacking around and looking at examples of how other stuff worked that I wanted to do, and by avoiding doing things in Rust that I didn't understand - stuff as basic as function calls.

The resulting code worked but was garbage and at the end of the day Rust had not clicked for me and being fluent in it still felt a distant goal.

I love the idea of Rust but I don't like the implementation.

I'm hoping in time there will be a new language created that has the memory and thread safety of Rust but is 50 times more simple.

estebank · on Dec 1, 2022

If you don't mind, what where the things that were hard to deal with? You mention "function calls", but I'd like to understand what that actually means in practice. This kind of negative feedback is useful to improve Rust for any other newcomers, even if we have already soured you for good.

andrewstuart · on Dec 1, 2022

I could not even succeed in doing the simplest thing like putting code into a function and executing the function and getting a result back. I can't recall why.

likeabbas · on Dec 1, 2022

Interestingly, I started a side project to rebuild OneTimeSecret in Rust yesterday and got the basics working already https://github.com/likeabbas/RustOneTimeSecret

The only other exposure I had to the language was attempting some Leetcode problems. I really like Rust so far. The type system is extremely powerful, the compiler gives robust error messages, and it has so many awesome features like pattern matching and destructuring.

andrewstuart · on Dec 1, 2022

You are smarter than me.

likeabbas · on Dec 2, 2022

Doubtful. I may have wanted to like Rust more than you when giving it a shot.

I've written a lot of Java and Go but I never enjoyed writing in those languages. I found functional programming languages in college to be very fun, and Rust is giving me that same feeling.

mathewvp · on Dec 1, 2022

> If you don't mind, what where the things that were hard to deal with?

The language syntax itself. Looks like the language developers wanted rust to stand out and purposefully went out of way to make it extremely difficult to learn. This is one area where Go shines, you can learn the syntax in half a day.

moloch-hai · on Dec 1, 2022

Rust syntax shoulders a load of semantics most languages don't bother trying to represent. They could have made it more wordy, instead, but chose the much shorter symbols, many of which can often be ignored when skimming, more easily than keywords would have been.

They have not loaded up the language with any unnecessary syntax. Go lacks the syntax because it is unable to express the semantics, and is (deliberately) a much less capable language for that. Rust is meant for professionals, so must be able to express whatever they may need to say, where Go was meant to keep junior coders out of trouble.

mathewvp · on Dec 2, 2022

> Rust is meant for professionals, ..., where Go was meant to keep junior coders out of trouble.

So we the not so elite coders will stick with Go, thanks.

andrewstuart · on Dec 1, 2022

>> They have not loaded up the language with any unnecessary syntax.

Rust is composed of at least six sub languages.

estebank · on Dec 1, 2022

What syntax would you remove?

likeabbas · on Nov 30, 2022

fluvio.io ?

judge2020 · on Nov 30, 2022

> local servers tend to be much faster and cheaper than cloud.

Of course, running a server in your house is not going to achieve five or even three 9's of reliability, and even colocating a single rack in a single location might be more expensive than putting that infra in AWS (depending on how data-heavy your use case is, given AWS' exorbitant data transfer costs).

lantry · on Nov 30, 2022

you can hit three nines even if you're down for 1.5 minutes every day, or ten minutes a week. It's really not as hard to hit as it sounds. For a compute heavy process that isn't end user facing (e.g. batch processing) it's perfectly viable.

https://uptime.is/

Also, most cloud providers don't guarantee five nines anyway. GCE SLA is 99.5 on a single instance, 99.99 on a region

https://cloud.google.com/compute/sla

neodypsis · on Nov 30, 2022

Which database are you using?

andrewstuart · on Nov 30, 2022

neodypsis · on Nov 30, 2022

Was this a while ago or was it a recent experience? I'm asking because I'm planning on using a serverless instance of PG and was interested in trying the ARM64 version.

andrewstuart · on Nov 30, 2022

About 18 months ago.

Try it - it might work fine for you.

jdmichal · on Dec 1, 2022

We've been running RDS PostgreSQL on r6g instances for the past few months with no issues.

ecliptik · on Nov 30, 2022

I'm interested in hearing more about their switching to Graviton with Clickhouse.

We've been testing Clickhouse on Graviton and the performance isn't there due to a variety of reasons, most notably it seems because Clickhouse for arm64 is cross-complied and JIT isn't enabled like it is for amd64[1].

1. https://fosstodon.org/@manish/109397948927679076

phamilton · on Nov 30, 2022

For AWS managed resources definitely use Graviton. But for spot instances in EC2 we've found better pricing and greater availability by staying on x86. (We run 100% of our web services and background workers on spot instances).

kodyo · on Nov 30, 2022

Same experience here. I cut over a whole bunch of instances to Graviton a while back and it "just worked" for a lot of our workloads. Test it first, obv.

Another easy cost-savings switch is telling Terraform to create EC2 root volumes using gp3 rather than gp2 (gp3 is ~10% cheaper and theoretically more performant). The AWS API and Terraform both still default to gp2 (last I checked), and I wonder how many places are paying a premium for it.

glogla · on Nov 30, 2022

I have two comments.

AWS Graviton is interesting because it is pretty different machine to their AMD and Intel offerings. A "16 vCPU" machine from AWS is 8 cores/16 threads, not 16 cores - except for Graviton, which actually has 16 cores, although much weaker ones. So for problems where you the cores can actually work in parallel, Graviton can keep up with AMD and Intel, while being somewhat cheaper. In single-threaded workload you get about half the performance.

Second thing I curious about is this very AWS heavy approach. ECS, CodeDeploy, ElastiCache. If I was their architect, I would probably go EKS, GitHub/Lab, Redis on EKS, just for the peace of mind.

dserodio · on Nov 30, 2022

ECS is so much simpler to use and understand than Kubernetes, even on EKS.

But as for CodeDeploy... IMHO the only reason to use it is "I don't want to deal with another vendor" due to procurement/compliance hell in large companies.

fuzzyengineer · on Nov 30, 2022

We have a very similar story at my org. We run around 100 RDS aurora clusters and switched to graviton. I'm surprised to see 35% gains here, we saw more like 10-15%. But since amazon natively supports mysql on aurora we didn't have to worry about compatibility. Our main highlight was the way we wrote our infra as code where we made switching instances types or service we use fairly simple task, so we have switched instance types a couple of times in past and could easily make dev use t3s. Getting on cloud is a trap and not the usual we deploy on the servers and we live situation. Give weight to write some good code to manage your infra and able to adopt optimizations as they occur. It will ramp up in expense soon otherwise.

Agingcoder · on Nov 30, 2022

I'm not sure I understand the point of this article : in theory they don't depend on x86 only code, so they've switched to arm and it worked, as expected, and things are cheaper.

I'm happy that they've shrunk their bill, but I somehow expected some kind of 'unfortunately, things went wrong because of bizarre memory model issues causing difficult concurrency bugs'.

What am I missing?

moloch-hai · on Nov 30, 2022

You can get those bugs when you are doing your own atomics, and your code relies on x86's relaxed memory semantics. It looks like their code is JS and Go, which buries that stuff. Services they use were already proven out on ARM. (Or, maybe, are not on ARM?)

Relaxed memory bus semantics imposes a pretty substantial performance cost. Depending on how they are billed, this might account for a big chunk of their lower cost. But probably not.

Their real problem is that they are firmly entrenched in proprietary Amazon services, so switching to another cloud would be very difficult. Amazon can raise prices 35% anytime, and what can they do?

MuffinFlavored · on Nov 30, 2022

> in theory they don't depend on x86 only code, so they've switched to arm and it worked, as expected, and things are cheaper.

What's Intel's response to this as a company? I know that isn't mentioned in the article but... just curious

Does Intel have any ARM offering whatsoever?

Does AMD have any ARM offering?

sofixa · on Nov 30, 2022

> What's Intel's response to this as a company?

They've lowered prices on their CPUs, and they have come out with so-called BIG.little designs (small eco low powered CPU cores for mundane stuff, and higher power ones for heavy lifting, in the same package with automatic switching - phones with ARM have had this for years, but it's new for x86 desktops/laptops).

Intel don't make ARM based chips, while AMD (who generally do more diverse things like special designs for consoles) have indicated they might. I'm not sure what competitive advantage they'd have, but Qualcomm need more ARM competition so anyone is welcome. The big blocker is TSMC capacity though.

johnklos · on Dec 1, 2022

Not saying what you're switching from makes the title a bit click-baity.

The current title implies that AWS (Graviton or not) is somehow cheaper than other things, when AWS is quite often one of the priciest options out there.

"Switching from AWS to AWS Graviton slashed our infrastructure bill", on the other hand, is an article worth examining.

matt-p · on Nov 30, 2022

Imagine how much they'd save by not using AWS in the first place

andrewxdiamond · on Nov 30, 2022

Nothing, because they would have bought legacy servers and would be stuck with Intel for another decade

fooyc · on Nov 30, 2022

There is a world between using AWS and buying your own servers

matt-p · on Nov 30, 2022

Aws is not a sensible place for general compute even if arm where 10x as performant as the intel equivalent and you were using reserved instances.

makestuff · on Nov 30, 2022

I have only ever used AWS, what is the go to cloud provider these days? I know GCP and Azure are catching up, but are people just going back to renting some boxes in a data center and just hosting their stuff on there?

jiggawatts · on Nov 30, 2022

Some orgs deploy Kubernetes on clusters of "bare metal" servers, which is very efficient and cost effective if done above some minimum scale that amortises the cost of the SREs needed to operate the beast.

However, the cloud is not just about compute. The ability to have zone- and region- redundant blob storage that scales to petabytes and has "many nines" of availability is very hard to emulate. Similarly, there are many other turnkey technologies in most clouds that have only complex and expensive on-prem versions.

For example, something like Azure Storage Account "Queues" are basically free and very easy to set up and use. The second you start looking at a highly available cluster of servers providing a queue or service bus, the minimum cost is orders of magnitude higher than a small Azure storage account.

And so on, and so forth...

kaustubhvp · on Nov 30, 2022

The link throws a 503 rn. I wonder what would be the reason

lemonJS · on Nov 30, 2022

Sorry, autoscaling took a second to ramp up

Terretta · on Nov 30, 2022

The marriage of an all arm64 dev env and a Graviton op env is a match made in heaven.

Everyone please do this so we collectively fix all the things to work with this. :-)

StillBored · on Nov 30, 2022

I upvoted to compensate for the downvotes, but I really curious why you think its a match made in heaven vs just using a random x86 laptop and an x86 cloud instance?

My take, is that you want to cheer Arm on in this space, not because there is some huge technical advantage, or the arch meets your fancy or whatever, but because it is adding another competitor to the space. One that brings its own baggage, but having three+ competitors competing to be the best (intel/amd/various arm vendors) is a good thing for the industry.

But the blank fanboyism is just harmful, the mid 2010's with all the Intel fanboys talking up Intel, while they screw everyone with low clocked server processors limited to 2 cores copy pasted in their laptops when you could buy a freeking phone with 8 cores, is what you get when one company gets to much market share or too far ahead of everyone else. The same thing is going to happen if gravaton becomes the dominate platform, except its going to be a case that you won't be able to buy competitive onprem hardware, or any number of other shortcomings. Or like the Mac a piece of hardware which isn't technically locked down, but also will likely always have subpar support running any operating system not shipped by apple, and could be locked down tomorrow without affecting their business one bit.

So, careful what you wish for. You want competitors that show the giant monopoly that maybe designing a processor for an actual laptop is advantageous over shoveling whatever leftovers from the hyperscalers happen to exist. You also want competitors that show up and pack 2x the cores at 1/2 the price. Or competitors that show up with huge power hungry processors that are pushing the limits of single threaded, high Ghz processors, or 500W GPUs because that is what some people need/want.

Terretta · on Dec 1, 2022

40% cost savings and better perf

also I didn't mean all the things have to work like this only, I meant all the things that are broken need to be fixed

I was not clear

_joel · on Nov 30, 2022

It is unless you're the first poor soul to embark on the journey with lots of x86_64 buildup. Having said that, it's been fun though so far. Managed to migrate our dev local k8s toolchain and been using buildx to make multiarch images and manifests for our internal stuff.

fred_is_fred · on Nov 30, 2022

We've moved everything to graviton except EKS (MSK, RDS, etc). Did you have any major issues? Would you do it again?

_joel · on Nov 30, 2022

We've only just started! Hence me hitting all the fun issues with our 'organically' grown dev toolchain :)

It's been fairly simple so far with a few niggles. We won't be able to go full arm due to having to support 'stuff' but for the management planes and prometheus etc it's all hunky dory.

fxtentacle · on Nov 30, 2022

FYI https://squeaky.ai/legal/gdpr I believe using Amazon AWS already disqualifies you from being fully GDPR-compliant. Same issue as Shopify has with using US CDNs: https://lsww.de/shopify-illegal/

Since your core sales feature is "privacy friendly" which will surely be appreciated in the EU, it might make sense to offer local hosting or self-hosting.

mbesto · on Nov 30, 2022

> I believe using Amazon AWS already disqualifies you from being fully GDPR-compliant.

AFRIK - There is nowhere in GDPR that says your data ought to reside in EU server per GDPR.

However, if I understand the Shopify legality complaint it's saying "because your data is hosted by a US entity and theoretically could be accessed by the US authorities it means the US authorities are now part of the data custody and you can't guarantee that they also have that data". That's a legal grey area with a lot of political ramifications.

According to Shopify this doesn't make it illegal: https://www.shopify.com/de/blog/shopify-dsgvo-konform-deutsc...

fxtentacle · on Nov 30, 2022

Yeah, according to Shopify.

According to a German court, a US parent company being able to access your data - which is the case both for Shopify and here - automatically disqualifies you from being GDPR-compliant: https://gdprhub.eu/index.php?title=VK_Baden-W%C3%BCrttemberg...

mbesto · on Nov 30, 2022

> which is the case both for Shopify and here - automatically disqualifies you from being GDPR-compliant:

It doesn't automatically disqualify you. No reason to spread this FUD. From your article:

> The Chamber found that, contrary to what Company A stated in their offer, it did disclose customer data to a third party. More specifically, it disclosed customer data to a third party in a third country (its parent company in the U.S.). Therefore, a transfer pursuant to Article 44 GDPR would take place. The Chamber explained that a transfer in this context must also be assumed when data can be accessed from a third country, regardless of whether this actually takes place. The fact that the physical location of the server that provided such access was located in the EU was irrelevant.

This has to do with the transfer of data from the EEA region to the US, which AWS covers: https://aws.amazon.com/compliance/gdpr-center/#GDPR_FAQs

So, no, from a blanket perspective using AWS doesn't automatically disqualify you from GDPR, but it may have implications based on how you transfer the data.

EDIT: To add, as part of Article 44 of GDPR:

> Under Article 44 GDPR, the transfer (or the onward transfer) shall only take place “subject to the other provisions of this Regulation”. As a result, data controllers or processors exporting personal data to third countries or international organisations must ensure the GDPR compliance of the overall processing activity.

So, if AWS follows GDPR compliance in the US (which as a default AWS US does) and you transfer from EU to US, then you can still achieve GDPR compliance. The reason this was thrown out: https://gdprhub.eu/index.php?title=VK_Baden-W%C3%BCrttemberg... was because the company said "that it would not disclose customer data to any third party", but when they reviewed the case they found out that because a parent company. So they are not GDPR compliant because they failed to disclose that data would transfer to AWS US, NOT that they are using AWS. This is the discrepancy. Lesson learned here - GPDR is more about process control than it is technology.

FYI - that case appears to be a public trading company that is owned by the public serving the government. It seems clear to me that they wanted to send a message of "hey just don't use any US-based company for your services to German government services, here's how we're legally going to penalize you for it". This would be like having an "American first" policy for gov't procurement and making sure TenCent's US based wholly owned entity can't do business with the US gov't.

fxtentacle · on Dec 1, 2022

That "disclose customer data to a third party" violates article 44 of the GDPR if there's no matching exemption to allow it. One possible exemption would be if the recipient is also bound by the GDPR. But obviously, the US government is not bound by GDPR. So anything that would allow the US CLOUD act to access a EU customer's data is a GDPR violation.

mbesto · on Dec 1, 2022

"So anything that would allow the US CLOUD act to access a EU customer's data is a GDPR violation"

Which is essentially the argument and is a huge legal grey area right now.

Similar situation here:

The EU’s data protection supervisor (EDPS), which oversees the bloc’s own institutions’ GDPR compliance, has been looking into the European Commission’s use of Microsoft Office 365 since May last year — as well as probing EU bodies’ use of Amazon’s cloud services.

The European Data Protection Board (EDPB) also kicked off a related coordinated enforcement action in February that it said would focus on the public sector’s use of cloud services — which it said would take about a year to report, with the aim for the action to harmonize regulatory interventions in this area.[0]

As you can see, nothing has happened yet and this is all still evolving. It seems pretty clear that the EU is using GDPR as a political wedge to drive business back to their countries (despite company's in those countries clearly having a desire to continue to use those products and services). Again, it's not as black and white as you are making it out to be - it's still be fought.

[0] - https://techcrunch.com/2022/11/28/microsoft-365-faces-darken...