The post-modern data stack is going to be PRQL + DuckDB + Prefect, and it's goin...

bonchicbongenre · on Jan 25, 2024

I'm with you at least 2/3 of the way. My preferred stack is PRQL + DuckDB + Dagster. I evaluated the space for work at my current company (was originally only DE, handling ingests from ~300 sources across various systems, on order of ~1k downstream tables in dbt + hundreds of dashboards + a handful of business-critical data+app features; now leading a small team).

I came away ranking dagster first, prefect second, everything else not close. IMO dagster wins fundamentally for data engineers bc it picks the right core abstraction (software defined assets) and builds everything else around that. Prefect for me is best for general non-data-specfic orchestration as a nearly transparent layer around existing scripts.

Ofc to each their own based on their usecase.

esafak · on Jan 25, 2024

Are you sure prefect is better than flyte? How so?

https://neptune.ai/blog/best-workflow-and-pipeline-orchestra...

cced · on Jan 25, 2024

Thoughts on dbt?

nerdponx · on Jan 25, 2024

Great idea, kind of a chaotic mess in practice. Better than nothing by far, but the industry I think will be eager to receive an improved alternative.

The problem with any tool like Dbt that abstracts over differences in databases is that a huge amount of work goes into building "adapters" to support the various details and quirks of each supported database. That ends up being a substantial technological moat which inhibits the growth of competitor systems. Another option is to do what Datasette did and focus on supporting one specific database, gradually expanding to a second database after years of demand for it.