I had a similar idea (except using kafka) : have all the nodes write to a kafka ...

fifilura · on April 6, 2024

Yeah kafka would handle it, except in my experience i would like to avoid kafka if possible, since it adds complexity. (Fair enough it depends on how precious your data is, if it is acceptable to loose some of it if a node crashes)

But somehow they are ingesting the data over network. Would writing files to s3 be slower than that? Otherwise you don't need much more than a RAM buffer?

Edit: to be clear, kafka is probably the right choice here, it is just that kafka and me is not a love story.

But it should be cheaper to store long term data in s3 than storing it in kafka, right?