To somewhat echo BurritoElPastor's comment, running a system/app that can be run in multiple clouds is orders of magnitude more difficult than just running a system/app that can be run in multiple regions.
And, not to be snarky, but many of the other responses that are along the lines of "It's not really that difficult to run in multiple clouds" - let's just say I have trouble believing these commenters have real world experience actually doing this. I'm not saying it's impossible, but it is extremely difficult for any system of reasonable complexity with a dev team of, say, 10 or more people.
And, if you can stomach the cost, you do give up the ability to really use any of the proprietary (and often times awesome) functionality of a particular provider, which can put your dev velocity at a big disadvantage.
It's not trivial but it's also not an order of magnitude more difficult anymore, as you describe it. There is a reason why Kubernetes gets a lot of backing from corporate customers - precisely because it hides and abstracts most of the underlying infrastructure and provides platform-agnostic primitives that make sense at the application level.
Once you have deployed your stack on Kubernetes, you can pretty much run it on any cloud or infrastructure with minor tweaks at most.
It's quite common in cloud solution design to design for failure. One of the common assumptions that we hold to is that one region may go down. Other examples: Assume an instance of an app can go down. Assume a VM can go down. Assume a DC can go down.
However much we technical people might salivate at the prospect of designing a multi-cloud solution, for the vast majority of businesses it simply isn't worth the cost / complexity. I'd wager 90-something percent of applications could suffer multi-hour outages without impacting business function to any measurable degree.
Plus the fact that without serious investment, you're probably more liable to decrease availability by going multi-cloud thanks to the increased system complexity.
The real trick here, which many people don’t want to look at, is to avoid overly centralizing your workflow.
I can get a lot of work done while Outlook is down. Hell, probably more work done.
If our build server is down I can work for a couple hours (unless we’ve done something very bad). Same for git or our bug database or wiki or or or. When I get stuck on one thing I can swap to something else every couple of hours. And there is always documentation (writing or consuming).
But if some idiot, hypothetically speaking of course, puts most of these services into the same SAN, then we are truly and utterly screwed if there is a hardware failure.
Similarly if you make one giant app that handles your whole business, if that app goes down and there are no manual backups you might as well send everybody home.
I went to get a drink the other day and the place looked funny. They’d tripped a circuit breaker and the whole kitchen lost power. But the registers and the beverage machines were on a separate circuit. And since they sold drinks and food in that order, they stayed open and just apologized a lot. Whoever wired that place knew what they were doing.
Probably lost 1 of 3 phases. You're quite right in that the decision of what phase a circuit is on has a lot to do with business, and hopefully no major repurposing of the space without rewiring the space has occurred. For lighting, you'd want 1/3 of fixtures per room to go out, not 1/3 of your rooms in their entirety. For appliances and receptacles, you'd rather lose a whole function (the kitchen) than be able to cook but not do dishes, with every function trying to figure out oddball workarounds.
The chance that AWS goes down is much smaller than anything else going down. There are many SPOFs in a typical smaller company setup, most of those are not even obvious to the operators.
AWS had a multi-hour total S3 outage in us-east-1 in February 2017 that knocked out a huge number of things mostly because it turns out that a huge share of their customers run in only 1 region and it's us-east-1. Things mostly continued to work in other regions.
I recall Azure had some sort of multi-region database failover disaster that took several regions offline, and GCP has had several global elevated latency/error rate events, but I don't think that any cloud provider has been "down" in the sense that the word is usually used.
I don’t think it’s happened (yet) although some of the earlier outages when AWS was younger were pretty far reaching. I think all of S3 has gone down a time or two.
Some APIs were impacted, because they are global by nature (e.g create-bucket). But S3 was working fine in all other regions, for existing buckets.
However, many websites were affected, because they didn't use any of the existing S3 features that allow for regional redundancy, simply because S3 had been so reliable they didn't know/think they needed to have critical assets in a bucket in a 2nd region that they could fail over to.
Admittedly, even the AWS status page was impacted, because it also relied on S3 in us-east-1.
S3 has done a lot of work to improve matters since, and mechanisms have been put in place to ensure that all AWS services don't have inter-region dependencies for "static" operation.
However, it is still incorrect to claim that it was all of S3. Many customers who use S3 only in other regions were totally unaffected.
Well, sure, if you hate your devops team and you want to make sure they can’t use any of the proprietary functionality of either provider. At which point, if you want to be managing a fleet of vanilla Linux boxes yourself, why use a cloud provider at all?
* You should not be locking yourself into proprietary functionality of a cloud provider unless you are deeply interested in what happened to Oracle customers getting raked over the coals happening to you.
* DevOps teams can be multi-cloud relatively easy when using infrastructure as code tooling (Terraform, Packer, etc) and traditional DevOps practices
* Why manage a fleet of vanilla boxes when you can use vanilla boxes with Kubernetes and not get gouged by cloud providers in the first place?
You don't need to jump off the hype train if you never got on in the first place.
Proprietary managed services can save a lot of dev/setup/SRE time though. Many businesses have more pressing things to work on than spending dev time to prevent vendor lock-in.
Not yet, but it seems obvious to me that the GP was referring to a situation where the price changes and then you are getting gouged. That's exactly what the negative connotations of lock-in refer to.
Each provider will seek to make you take their one true path, or you need to do your own engineering.
Using the providers path isn’t necessarily gouging, but it isn’t cost optimized either. The answer depends on you.
That said, cloud is like any tenant/landlord relationship. Your rights are linked to time and are whatever your contract provides. If you didn’t like Office 2007, you didn’t buy it. If you don’t like Office 365, 2021 edition, too bad.
It's not quite that black and white. You can use common/open APIs and cross-provider tooling whenever available and provider-flavored ones where necessary. It's more effort, but still less than hand-rolling everything.
Of course that only works as long as you're swapping out largely replaceable parts. If you built everything around some proprietary service then yeah, you've tied yourself to that anchor.
This seems overly negative. There are lots of ways to do hybrid clouds, especially if you’re doing it for only the more critical parts of your application.
Cost+speed of scalability, and managed services. If you rarely need to scale, your workloads are all predictable, and you don't need managed services/support, you should just buy some VPSes or dedicated boxes.
It's not really that I "want to lock into a cloud provider". Sometimes I simply don't have the human bandwidth available to handle devops and sysadmin work while building the actual product.
"Outsourcing" those functions to cloud services can be big win for a small team. Like all engineering, it's a trade off.
For the same reason you want "to lock in" (meaning use) any solution. You do not want to build or operate it yourself. Why don't you take this further? Why to use a water utility if you can just drill your own wells? Most businesses are better of on cloud because their core business is not to build and operate datacenters but provide services to their customers (on the top of datacenters running their apps).
If you're running in multiple clouds for HA/DR reasons, you are limited to the lowest common denominator of features/services between them. Or maintaining multiple codebases/architectures, and the massive pile of issues that entails. I am not a fan of multi-cloud for this reason.
Multiple regions, as long as your provider offers all of the services, you can have a carbon copy. Much easier.
It depends on your needs, your architecture, your risk tolerance, etc. I think for most people "Use multiple regions" is the answer that strikes the correct balance. It probably isn't the correct answer for everyone.
Certain terms and conditions may apply :) Carbon copy of a static website or one whose data is only a one-way flow from some off-cloud source of truth? Sure! Multi-master or primary-secondary with failover? Stray too far from the narrow path of specialized managed solutions and things get very complex, very quickly. That being said - it's mostly just the nature of the beast. If you're not able to tolerate a regional outage, multi-region is a pill you're going to have to swallow, no buts about it.
This is one of the reasons things like Federated Kubernetes is being worked on. Stick a CDN in front and your compute can be migrated from cloud to cloud. You still need to do a lot of thinking about data though.
More than one region is pretty easy, more than one provider is harder (especially if your workload is designed from the ground up for it.) But, yes, just as multi-region protects you from things mere multi-AZ doesn't, multi-provider protects you from even more.
I have an awesome demo I give running a complex stateful workload across cloud providers to show off the system that I work on. What I have learned from giving that presentation many times is that while it is nice to say you can run cross cloud, for most workloads you should just pick one cloud, and be able to move to another provider if you ever need to.
No, not unless you are someone like Netflix. Usually you can configure multi-region failover and such and that will keep your things running. It is more expensive but for most use cases I think the cost is still less than the dev time/complexity of setting up multi-provider workflows and the inevitable duplication of resources (which is part of the cost of multi-region anyway)
No. And there's been a lot of talk recently about multi-provider being the right strategy to mitigate downtime, which IMHO is a farce peddled by expensive consultants. The parent comment is correct - this is why availability zones and regions have been established by each provider.
For the large majority of businesses investing in infrastructure-as-code far outweighs any crazy HA, redundant, multi-provider, whizzbang whatever setup you may have.
You can move 1.6TB between providers in a month for the same price as a single beefy DB server (m4.16xlarge here). That's a whole lot of logical replication..