Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

And not something like Spark on EMR?


Well no, unfortunately.

Remember that "data is a team sport". Together, we try and make better decisions (in manual or automated ways). A DE can produce great data but it's only useful if it helps the DA/DS. There's a lot of friction there.

Most of that friction disappears with SQL-based orchestration tools (I mean specifically dbt here, but there are others). Suddenly the analyst can create the data they need! With minimal guidance from a DE.

That can be with Spark SQL (+ DeltaLake / Iceberg), or some warehouse. That's not the issue.

The issue is around keeping orchestration simple when you're not just doing simple stuff anymore. Keeping that DAG logical, clear, and smooth is difficult once you include non-SQL items.

This isn't solved by Spark UDFs unfortunately :)




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: