Hacker News new | past | comments | ask | show | jobs | submit login

> Do you have any examples of companies building Hadoop clusters for amounts of data that fit on a single machine?

I was a SQL Server DBA at Cox Automotive. Some director/VP caught the Hadoop around 2015 and hired a consultant to set us up. The consultant's brother worked at Yahoo and did foundational work with it.

Consultant made us provision 6 nodes for Hadoop in Azure (our infra was on Azure Virtual Machines) each with 1 TB of storage. The entire SQL Server footprint was 3 nodes and maybe 100 GB at the time, and most of that was data bloat. He complained about such a small setup.

The data going into Hadoop was maybe 10 GB, and consultant insisted we do a full load every 15 minutes "to keep it fresh". The delta for a 15 minute interval was less than 20 MB, maybe 50 MB during peak usage. Naturally his refresh script was pounding the primary server and hurting performance, so we spent additional money to set up a read replica for him to use.

Did I mention the loading process took 16-17 minutes on average?

You can quit reading now, this meets your request, but in case anyone wants a fuller story:

Hadoop was used to feed some kind of bespoke dashboard product for a customer. Everyone at Cox was against using Microsoft's products for this, while the entire stack was Azure/.Net/SQL Server...go figure. Apparently they weren't aware of PowerBI, or just didn't like it.

I asked someone at MS (might have been one of the GuyInACube folks, I know I mentioned it to him) to come in and demo PowerBI, and in a 15 minute presentation absolutely demolished everything they had been working on for a year. There was a new data group director who was pretty chagrined about it, I think they went into panic mode to ensure the customer didn't find out.

The customer, surprisingly, wasn't happy with the progress or outcome of this dashboard, and were vocally pointing out data discrepancies compared to the production system. Some of them days or even a week out of date.

Once the original contract was up, and time to renew, the Hadoop VP now had to pay for the project from his budget, and about 60 days later it was mysteriously cancelled. The infra group was happy, as our Azure expenses suddenly halved, and our database performance improved 20-25%.

The customer seemed to be happy, they didn't have to struggle with the prototype anymore, and wow, where did all these SSRS reports that were perfectly fine come from? What do you mean they were there all along?




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: