“You should be using more than 1 region” could also be “you should be using more...

hn_throwaway_99 · on July 2, 2019

To somewhat echo BurritoElPastor's comment, running a system/app that can be run in multiple clouds is orders of magnitude more difficult than just running a system/app that can be run in multiple regions.

And, not to be snarky, but many of the other responses that are along the lines of "It's not really that difficult to run in multiple clouds" - let's just say I have trouble believing these commenters have real world experience actually doing this. I'm not saying it's impossible, but it is extremely difficult for any system of reasonable complexity with a dev team of, say, 10 or more people.

And, if you can stomach the cost, you do give up the ability to really use any of the proprietary (and often times awesome) functionality of a particular provider, which can put your dev velocity at a big disadvantage.

Doubleslash · on July 3, 2019

It's not trivial but it's also not an order of magnitude more difficult anymore, as you describe it. There is a reason why Kubernetes gets a lot of backing from corporate customers - precisely because it hides and abstracts most of the underlying infrastructure and provides platform-agnostic primitives that make sense at the application level.

Once you have deployed your stack on Kubernetes, you can pretty much run it on any cloud or infrastructure with minor tweaks at most.

CaptainJustin · on July 2, 2019

It's quite common in cloud solution design to design for failure. One of the common assumptions that we hold to is that one region may go down. Other examples: Assume an instance of an app can go down. Assume a VM can go down. Assume a DC can go down.

This is not to excuse the downtime in any way.

wombatpm · on July 2, 2019

Do we need a new definition for RAID level?

Redundant Array of independent Data Clouds.

I guess for RAID 5 would I need a min of 3 regions or 3 separate cloud providers.

lwb · on July 2, 2019

Do people ever worry that an entire cloud provider may go down, or is that too unlikely of a case?

jsty · on July 2, 2019

However much we technical people might salivate at the prospect of designing a multi-cloud solution, for the vast majority of businesses it simply isn't worth the cost / complexity. I'd wager 90-something percent of applications could suffer multi-hour outages without impacting business function to any measurable degree.

Plus the fact that without serious investment, you're probably more liable to decrease availability by going multi-cloud thanks to the increased system complexity.

hinkley · on July 2, 2019

The real trick here, which many people don’t want to look at, is to avoid overly centralizing your workflow.

I can get a lot of work done while Outlook is down. Hell, probably more work done.

If our build server is down I can work for a couple hours (unless we’ve done something very bad). Same for git or our bug database or wiki or or or. When I get stuck on one thing I can swap to something else every couple of hours. And there is always documentation (writing or consuming).

But if some idiot, hypothetically speaking of course, puts most of these services into the same SAN, then we are truly and utterly screwed if there is a hardware failure.

Similarly if you make one giant app that handles your whole business, if that app goes down and there are no manual backups you might as well send everybody home.

I went to get a drink the other day and the place looked funny. They’d tripped a circuit breaker and the whole kitchen lost power. But the registers and the beverage machines were on a separate circuit. And since they sold drinks and food in that order, they stayed open and just apologized a lot. Whoever wired that place knew what they were doing.

hunter2_ · on July 3, 2019

Probably lost 1 of 3 phases. You're quite right in that the decision of what phase a circuit is on has a lot to do with business, and hopefully no major repurposing of the space without rewiring the space has occurred. For lighting, you'd want 1/3 of fixtures per room to go out, not 1/3 of your rooms in their entirety. For appliances and receptacles, you'd rather lose a whole function (the kitchen) than be able to cook but not do dishes, with every function trying to figure out oddball workarounds.

StreamBright · on July 2, 2019

The chance that AWS goes down is much smaller than anything else going down. There are many SPOFs in a typical smaller company setup, most of those are not even obvious to the operators.

jsjohnst · on July 2, 2019

In the past ten years:

It’s happened more than once with Azure and GCP. I think it happened once with AWS, but not positive there.

dodobirdlord · on July 3, 2019

AWS had a multi-hour total S3 outage in us-east-1 in February 2017 that knocked out a huge number of things mostly because it turns out that a huge share of their customers run in only 1 region and it's us-east-1. Things mostly continued to work in other regions.

I recall Azure had some sort of multi-region database failover disaster that took several regions offline, and GCP has had several global elevated latency/error rate events, but I don't think that any cloud provider has been "down" in the sense that the word is usually used.

jsjohnst · on July 3, 2019

GCP (and all of Google) was down worldwide in 2013 as one example:

https://www.theregister.co.uk/2013/08/17/google_outage/

Here’s one that’s on Azure. Not a 100% total outage like above, but bad enough most I know in the industry would call it being down:

https://www.zdnet.com/article/windows-azure-suffers-worldwid...

If I get a free moment, I’ll dig up other examples, but those were ones that were easy to find.

sneak · on July 3, 2019

Billing issues can take down your entire account at a given cloud provider all at once.

Spooky23 · on July 2, 2019

It’s a legit concern, but it adds complexity that will probably cause more outages than the thing you are worried about.

IMO, you’re better off with a private data center or colo and separate integrations with cloud.

debaserab2 · on July 2, 2019

I don’t think it’s happened (yet) although some of the earlier outages when AWS was younger were pretty far reaching. I think all of S3 has gone down a time or two.

jsjohnst · on July 2, 2019

All of S3 has, but that’s because S3 had a single choke point in a single region for a long time.

ti_ranger · on July 3, 2019

> All of S3 has, but that’s because S3 had a single choke point in a single region for a long time.

The only S3 event here was limited to us-east-1: https://aws.amazon.com/premiumsupport/technology/pes/

Some APIs were impacted, because they are global by nature (e.g create-bucket). But S3 was working fine in all other regions, for existing buckets.

However, many websites were affected, because they didn't use any of the existing S3 features that allow for regional redundancy, simply because S3 had been so reliable they didn't know/think they needed to have critical assets in a bucket in a 2nd region that they could fail over to.

Admittedly, even the AWS status page was impacted, because it also relied on S3 in us-east-1.

S3 has done a lot of work to improve matters since, and mechanisms have been put in place to ensure that all AWS services don't have inter-region dependencies for "static" operation.

However, it is still incorrect to claim that it was all of S3. Many customers who use S3 only in other regions were totally unaffected.

tyingq · on July 3, 2019

All of S3 create-bucket is "all of S3" for a lot of use cases and customers.

BurritoAlPastor · on July 2, 2019

Well, sure, if you hate your devops team and you want to make sure they can’t use any of the proprietary functionality of either provider. At which point, if you want to be managing a fleet of vanilla Linux boxes yourself, why use a cloud provider at all?

toomuchtodo · on July 2, 2019

* You should not be locking yourself into proprietary functionality of a cloud provider unless you are deeply interested in what happened to Oracle customers getting raked over the coals happening to you.

* DevOps teams can be multi-cloud relatively easy when using infrastructure as code tooling (Terraform, Packer, etc) and traditional DevOps practices

* Why manage a fleet of vanilla boxes when you can use vanilla boxes with Kubernetes and not get gouged by cloud providers in the first place?

You don't need to jump off the hype train if you never got on in the first place.

opportune · on July 2, 2019

Proprietary managed services can save a lot of dev/setup/SRE time though. Many businesses have more pressing things to work on than spending dev time to prevent vendor lock-in.

toomuchtodo · on July 2, 2019

Everyone spends their runway differently. Once you’re off the ground, derisk.

jjeaff · on July 3, 2019

Most companies don't have a "runway", they are just bootstrapped and have to actually justify their expenses and lock-in every day.

tomcam · on July 2, 2019

if I voluntarily choose a provider at a price that’s acceptable to me am I being gouged?

tjr225 · on July 2, 2019

Not yet, but it seems obvious to me that the GP was referring to a situation where the price changes and then you are getting gouged. That's exactly what the negative connotations of lock-in refer to.

Spooky23 · on July 2, 2019

Each provider will seek to make you take their one true path, or you need to do your own engineering.

Using the providers path isn’t necessarily gouging, but it isn’t cost optimized either. The answer depends on you.

That said, cloud is like any tenant/landlord relationship. Your rights are linked to time and are whatever your contract provides. If you didn’t like Office 2007, you didn’t buy it. If you don’t like Office 365, 2021 edition, too bad.

the8472 · on July 2, 2019

It's not quite that black and white. You can use common/open APIs and cross-provider tooling whenever available and provider-flavored ones where necessary. It's more effort, but still less than hand-rolling everything.

Of course that only works as long as you're swapping out largely replaceable parts. If you built everything around some proprietary service then yeah, you've tied yourself to that anchor.

stingraycharles · on July 2, 2019

This seems overly negative. There are lots of ways to do hybrid clouds, especially if you’re doing it for only the more critical parts of your application.

0xbadcafebee · on July 2, 2019

> why use a cloud provider at all?

Cost+speed of scalability, and managed services. If you rarely need to scale, your workloads are all predictable, and you don't need managed services/support, you should just buy some VPSes or dedicated boxes.

mathattack · on July 2, 2019

Staying on current versions, and the ability to scale usage up and down?

fkdo · on July 2, 2019

Why would you want to lock into a cloud provider? You're losing a lot of operational flexibility for less devops and sysaadmin work.

You are really limiting your tech stack by using standardized things like Jenkins, Docker, K8, mqtt, kafka.

bradstewart · on July 2, 2019

It's not really that I "want to lock into a cloud provider". Sometimes I simply don't have the human bandwidth available to handle devops and sysadmin work while building the actual product.

"Outsourcing" those functions to cloud services can be big win for a small team. Like all engineering, it's a trade off.

StreamBright · on July 2, 2019

For the same reason you want "to lock in" (meaning use) any solution. You do not want to build or operate it yourself. Why don't you take this further? Why to use a water utility if you can just drill your own wells? Most businesses are better of on cloud because their core business is not to build and operate datacenters but provide services to their customers (on the top of datacenters running their apps).

cthalupa · on July 2, 2019

If you're running in multiple clouds for HA/DR reasons, you are limited to the lowest common denominator of features/services between them. Or maintaining multiple codebases/architectures, and the massive pile of issues that entails. I am not a fan of multi-cloud for this reason.

Multiple regions, as long as your provider offers all of the services, you can have a carbon copy. Much easier.

It depends on your needs, your architecture, your risk tolerance, etc. I think for most people "Use multiple regions" is the answer that strikes the correct balance. It probably isn't the correct answer for everyone.

not_kurt_godel · on July 3, 2019

> you can have a carbon copy. Much easier.

Certain terms and conditions may apply :) Carbon copy of a static website or one whose data is only a one-way flow from some off-cloud source of truth? Sure! Multi-master or primary-secondary with failover? Stray too far from the narrow path of specialized managed solutions and things get very complex, very quickly. That being said - it's mostly just the nature of the beast. If you're not able to tolerate a regional outage, multi-region is a pill you're going to have to swallow, no buts about it.

ffk · on July 2, 2019

This is one of the reasons things like Federated Kubernetes is being worked on. Stick a CDN in front and your compute can be migrated from cloud to cloud. You still need to do a lot of thinking about data though.

richardw · on July 3, 2019

Three CDN's. And three DNS providers.

quickthrower2 · on July 2, 2019

Maybe. If you get a billing issue or get marked as suspicious, you can lose all services with one provider.

dragonwriter · on July 2, 2019

More than one region is pretty easy, more than one provider is harder (especially if your workload is designed from the ground up for it.) But, yes, just as multi-region protects you from things mere multi-AZ doesn't, multi-provider protects you from even more.

_Codemonkeyism · on July 2, 2019

dkhenry · on July 2, 2019

I have an awesome demo I give running a complex stateful workload across cloud providers to show off the system that I work on. What I have learned from giving that presentation many times is that while it is nice to say you can run cross cloud, for most workloads you should just pick one cloud, and be able to move to another provider if you ever need to.

benbro · on July 2, 2019

Is it practical to use several providers when egress is so expensive?

opportune · on July 2, 2019

No, not unless you are someone like Netflix. Usually you can configure multi-region failover and such and that will keep your things running. It is more expensive but for most use cases I think the cost is still less than the dev time/complexity of setting up multi-provider workflows and the inevitable duplication of resources (which is part of the cost of multi-region anyway)

mbesto · on July 2, 2019

No. And there's been a lot of talk recently about multi-provider being the right strategy to mitigate downtime, which IMHO is a farce peddled by expensive consultants. The parent comment is correct - this is why availability zones and regions have been established by each provider.

For the large majority of businesses investing in infrastructure-as-code far outweighs any crazy HA, redundant, multi-provider, whizzbang whatever setup you may have.

dragonwriter · on July 2, 2019

> this is why availability zones and regions have been established by each provider.

But the degree of independence provided by AZs is not constant across providers, despite similar terminology.

_wmd · on July 2, 2019

You can move 1.6TB between providers in a month for the same price as a single beefy DB server (m4.16xlarge here). That's a whole lot of logical replication..

majewsky · on July 2, 2019

Depends on your use-case.

timc3 · on July 2, 2019

You are comparing one overpriced SKU to another over priced SKU.