Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>What's to stop me from rolling my own Spark clusters and just using one of those? Is anyone doing this?

Ops. Unless your core competency is running reports and spark nodes, it's probably cheaper to outsource the management of Spark and friends than to hire people to make sure it's always up and running. To be fair I haven't touched Spark in many years but having to page someone who was good enough to spark to debug why a job stopped at 3am isn't fun.



>Ops. Unless your core competency is running reports and spark nodes, it's probably cheaper to outsource the management of Spark and friends than to hire people to make sure it's always up and running.

I think as an end user I would absolutely agree on this point. But many companies use Databricks as part of their automated backend systems that they resell to customers. The cost per "DBU" unit is astronomical for the amount of raw compute in use. It feels a bit like running a restaurant where you serve takeout.


[Disclaimer: Databricks employee] There's also a lot of value in DBSQL, Unity catalog (data management), and serverless for autoscaling that can all save money in terms of just running raw Spark. But if you want to operate Spark yourself, cool do it. We're happy for that, it builds the base of Spark committers over time and increases the quality of our products.


I can spin up and down 100+ node clusters on the 4 largest cloud providers at will.

What ops am I missing?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: