Ask HN: Working with Twitter Decahose

I have been following this discussion [1] & challenges on using datahose came up in few places. That leads me to earnestly ask:

We are a brand new research group with just a few hands. This Twitter data is enormous (45-50GB/day for E Asia in JSON). We have limited experience & hence saving it out as daily logs in flat JSON files

For people using decahose, what kind of system architecture have you put in place for storing & searching such data. We explored AWS DynamoDB & MongoDB datalake but the cost seemed just too high. Feedback & suggestions needed.

[1] Twitter plans to comply with Musk’s demands for data : https://news.ycombinator.com/item?id=31686055