Sure but not anywhere remotely near clearing the bar to simply calling that “rel...

Waterluvian · on Sept 21, 2022

When I think “reliability” I think “does it perform the act consistently?”

Consistently slow is still reliability.

somat · on Sept 21, 2022

It is not reliably running the machine but reliably getting the machine.

Like the article said, The promise of the cloud is that you can easily get machines when you need them the cloud that sometimes does not get you that machine(or does not get you that machine in time) is a less reliable cloud than the one that does.

rco8786 · on Sept 21, 2022

It’s still performance. If this was “AWE failed to deliver the new machines and GCP delivered”, sure, reliability. But this isn’t that.

The race car that finishes first is not “more reliable” than the one in 10th. They are equally as reliable, having both finished the race. The first place car is simply faster at the task.

somat · on Sept 22, 2022

The one in first can more reliably win races however.

rco8786 · on Sept 22, 2022

You cannot infer that based on the results of the race...that's literally the entire point I am making. The 1st place car might blow up in the next race, the 10th place car might finish 10th place for the next 100 races.

If the article were measuring HTTP response times and found that AWS's average response time was 50ms and GCP's was 200ms, and both returned 200s for every single request in the test, would you say AWS is more reliable than GCP based on that? Of course not, it's asinine.

onphonenow · on Sept 22, 2022

If you want that promise you can reserve capacity in various ways. Google has reservations. Folks use this for DR, your org can get a pool of shared ones going if you are going to have various teams leaning on GPU etc.

The promise of the cloud is that you can flexibly spin up machines if available, and easily spin down, no long term contracts or CapEx etc. They are all pretty clear that there are capacity limits under the hood (and your account likely has various limits on it as a result).

VWWHFSfQ · on Sept 21, 2022

I would still call it "reliability".

If the instance takes too long to launch then it doesn't matter if it's "reliable" once it's running. It took too long to even get started.

rco8786 · on Sept 21, 2022

Why would you not call it “startup performance”.

Calling this reliability is like saying a Ford is more reliable than a Chevy because the Ford has a better throttle response.

endisneigh · on Sept 21, 2022

that's not what reliability means

VWWHFSfQ · on Sept 21, 2022

> that's not what reliability means

What is your definition of reliability?

endisneigh · on Sept 21, 2022

unfortunately cloud computing and marketing have conflated reliability, availability and fault tolerance so it's hard to give you a definition everyone would agree to, but in general I'd say reliability is referring to your ability to use the system without errors or significant decreases in throughput, such that it's not usable for the stated purpose.

in other words, reliability is that it does what you expect it to. GCP does not have any particular guarantees around being able to spin up VMs fast, so its inability to do so wouldn't make it unreliable. it would be like me saying that you're unreliable for not doing something when you never said you were going to.

if this were comparing Lambda vs Cloud Functions, who both have stated SLAs around cold start times, and there were significant discrepancies, sure.

pas · on Sept 21, 2022

true, the grammar and semantics work out, but since reliability needs a target usually it's a serious design flaw to rely on something that never demonstrably worked like your reliability target assumes.

so that's why in engineering it's not really used as such. (as far as I understand at least.)