> Should I be penalized if an upstream dependency, owned by another team, fails?...

obstacle1 · on Dec 7, 2021

Say during due diligence two options are uncovered: use an upstream dependency owned by another team, or use that plus a 3P vendor for redundancy. Implementing parallel systems costs 10x more than the former and takes 5x longer. You estimate a 0.01% chance of serious failure for the former, and 0.001% for the latter.

Now say you're a medium sized hyper-growth company in a competitive space. Does spending 10 times more and waiting 5 times longer for redundancy make business sense? You could argue that it'd be irresponsible to over-engineer the system in this case, since you delay getting your product out and potentially lose $ and ground to competitors.

I don't think a black and white "yes, you should be punished" view is productive here.

bandyaboot · on Dec 7, 2021

Where does this mindset end? Do I lack due diligence by choosing to accept that the cpu microcode on the system I’m deploying to works correctly?

unionpivo · on Dec 7, 2021

If it's brand new RiscV CPU that was just relesed 5 min ago, and nobody really tested then yes.

If its standard CPU that everybody else uses, and its not known to be bad then no.

Same for software. Is it ok to have dependency on AWS services ? Their history shows yes. Dependency on brand new SaaS product ? Nothing mission critical.

Or npm/crates/pip packages. Packages that have been around and seedily maintained for few years, have active users, are worth checking out. Some random project from single developer ? Consider vendoring (and owning if necessary ) it.

jrockway · on Dec 7, 2021

Why? Intel has Spectre/Meltdown which erased like half of everyone's capacity overnight.

treis · on Dec 7, 2021

You choose the CPU and you choose what happens in a failure scenario. Part of engineering is making choices that meet the availability requirements of your service. And part of that is handling failures from dependencies.

That doesn't extend to ridiculous lengths but as a rule you should engineer around any single point of failure.

scottyah · on Dec 9, 2021

I think this is why we pay for support, with the expectation that if their product inadvertently causes losses for you they will work fast to fix it or cover the losses.

NikolaeVarius · on Dec 7, 2021

Yes? If you are worried about CPU microcode failing, then you do a NASA and have multiple CPU arch's doing calculations in a voting block. These are not unsolved problems.

javajosh · on Dec 7, 2021

JPL goes further and buys multiple copies of all hardware and software media used for ground systems, and keeps them in storage "just in case". It's a relatively cheap insurance policy against the decay of progress.