Hacker Newsnew | past | comments | ask | show | jobs | submit | Unroasted6154's commentslogin

Why wouldn't they make it for external payload if they get the cost per kg lower than F9? Running starship only is going to be cheaper than running both rockets, except if the economics of starship are worse (in which case, it would not be used for starlink either).


It's not a service shutting down though. It will still work fine for a while and it there is a critical security patch required, the community might still be able to add it.


No they are going to forbid people to commit anything to the project so even security patch will be blocked.


The chance of this not having a fork keeping security updates running is effectively zero.


It's a bit weird to present it as an alternative to S3 when it looks like a persistent cache or k/v store. A benchmark against Redis would have been nice for example. The benchmark for rocks DB is also questionable as the performance depends a lot on how you configure it, and the article's claim that it doesn't support range read doesn't give me confidence in the results.

Also for the descried issue of small images for a frontend, nobody would serve directly from S3 without a caching layer on top.

It's a interesting read for fun, but I am not sure what it solves in the end.


I'd have to assume it's a blob store for their search engine (or similar) project: https://blog.wilsonl.in/search-engine/


Yes, those are fair points.


You are supposed to build multi regional services if you need higher resilience.


Actual multi-region replication is hard and forces you to think about complicated things like the CAP theorem/etc. It's easier to pretend AWS magically solves that problem for you.

Which is actually totally fine for the vast majority of things, otherwise there would be actual commercial pressures to make sure systems are resilient to such outages.


You could also achieve this in practice by just not using us-east-1, though at the very least you should have another region going for DR.


Never said it was easy or desirable for most companies.

But there is only so much a cloud provider can guarantee within a region or whatever unit of isolation they offer.


Don't you have user profiles in Pixels? I can create another user an switch. Just not super convient. Work profiles are actually pretty good good... For work.


There is way more Java and C++ than Go at Google.


For all practical purposes impossible at this scale. The issue is not really the bug tbh.


What is the issue?


The issue was that a bug got triggered globally within a few seconds and all the things that led to that. The fact that is was a null pointer exception is almost irrelevant.

If you are talking about the issue for migrating to rust, well re-writing hundred of million lines of lines of code never makes sense.


The issue is how a person looks at it. You could say:

1) the cause was a null pointer, so let's fire that programmer and get a good one in there who doesn't make stupid mistakes.

Or instead taking a different angle and attitude:

2) Every programmer makes null pointer mistakes at some point, we need to treat human workers, though incredibly gifted, as not 100% perfect. Thus this is a failure of our system and our policies to catch a scenario that was inevitable

This type of thinking and parallel applies to any and all high risk situation, least of which is whether google cloud works, but also things like "my airplane crashed" or "the surgeon removed the wrong leg"


Probably C++ or java though.


What makes you think it was completely untested? The condition that triggered the null pointer exception was obviously not tested, but it doesn't mean it didn't have tests or even 100% unit test coverage for the coverage tools.

In addition it looks like the code was not ready for production and the mistake was not gating it behind a feature flag. It didn't go through the normal release process.


If Google spent Moon-landing level of money in their quality/deployment infrastructure I expect a much better coverage checker than "100% unit tested", they are famous for having a whole fuzzing infrastructure, coverage analysers for more complex interplay of logic is something I use daily in a non-Google levels of spending (even though still a big enough corporation) which often reminds me that I forgot to write a functional test to cover a potential orchestration issue.

I don't think "completely untested" is correct but tested way below expectations for such structural piece of code is a lesson they should learn, it does look like an amateur-hour mistake.


The main issues to me seem to be that the code was not gated by a flag when it was not ready to be used, thus skipping a lot of the testing / release qualification.


Good luck re-writing 25 years of C++ though.


It was Google's study that showed almost all bugs are in new code (and this was also the case of this incident)

You don't need to rewrite everything to prevent the majority of new bugs, it's enough to protect new code and keep the battle tested stuff around


You can do that for new binaries. For existing ones you can't really or you get in a worse place for a long time.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: