Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

So I can use a command line tool to process queries that are processing 100 TB of data? The last time I used Hadoop it was on a cluster with roughly 8PB of data.

Let me know when I can do it locally.



"Can be 235x faster" != "will always be 235x faster", nor indeed "will always be faster" or "will always be possible".

The point is not that there are no valid uses for Hadoop, but that most people who think they have big data do not have big data. Whereas your use case sounds like it (for the time being) genuinely is big data, or at least at a size where it is a reasonable tradeoff and judgement call.

To people's beliefs on this, here's a Forbes article on Big Data [1] (yes, I know Forbes is now a glorified blog for people to pay for exposure). It uses as example a company with 2.2 million pages of text and diagrams. Unless those are far above average, they fit in RAM on a single server, or on a small RAID array of NVMe drives.

That's not Big Data.

I've indexed more than that as a side-project on a desktop-class machine with spinning rust.

The people who think that is big data are the audience of this, not people with actual big data.

[1] https://www.forbes.com/sites/forbestechcouncil/2023/05/24/th...


for the given question, sure.. you can

there are 60tb ssd's out there.. you might even fit all of the 8tb on a given server


Unix has a split(1) tool.


Have you read the article ?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: