Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Interesting. I wish it had more details as far as inputs/outputs, data sizes in different phases.

One thing that I wonder about is how much work could they do to collect this data on a forward moving basis. Often I see huge lookback jobs that answer predictable/static questions -- prime candidates for aggregation during ingest.



This is the thing I was most forward to reading about in the article, but there were no figures about how large the "largest Google Dataflow job ever" is. There are a bunch of relative figures, 5x 2018 - but what does that translate to? How long did it take?


Ya, concrete details were conspicuously missing. Like petabytes? Exabytes? I suspect that the "largest dataflow job ever" is significantly smaller than the kind of crap Google regularly throws at the backend that dataflow runs on. With that infrastructure at their fingers, I suspect engineers regularly fire off jobs orders of magnitude larger than necessary simply because it's not worth the 3 hours of human effort it'd take to narrow down the input set.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: