Interesting. I wish it had more details as far as inputs/outputs, data sizes in ...

wobblykiwi · on Feb 18, 2020

This is the thing I was most forward to reading about in the article, but there were no figures about how large the "largest Google Dataflow job ever" is. There are a bunch of relative figures, 5x 2018 - but what does that translate to? How long did it take?

tylerl · on Feb 19, 2020

Ya, concrete details were conspicuously missing. Like petabytes? Exabytes? I suspect that the "largest dataflow job ever" is significantly smaller than the kind of crap Google regularly throws at the backend that dataflow runs on. With that infrastructure at their fingers, I suspect engineers regularly fire off jobs orders of magnitude larger than necessary simply because it's not worth the 3 hours of human effort it'd take to narrow down the input set.