Congrats on the release. Can dstack work with kubernetes? Like say another team owns the infrastructure and I’m required to use k8s, is it interoperable at all?
This seems like a good model for sustaining open source, but raises some questions.
Does anybody know how the DuckDB foundation works? The sponsors are MotherDuck, Voltron, and Posit, which are heavily venture-funded. Do DuckDB Lab employees work on sponsored projects for the foundation?
I am also curious if anyone can shed light on what kind of contract work DuckDB does to align its work with the open source project. This has always seemed like the holy grail, but it is difficult to do in practice.
Is there something about the presentation of the article that gives you the impression this was intended to be read as other than the author's opinion?
Sounds like you are writing a pipeline to enrich a stream of data with historical data from Snowflake. This is a fairly common pattern. If the data in snowflake is not changing often, you would want to cache that somewhere for use with the stream processor to avoid that query overhead and speed things up.
Most of those listed can meet your first 2 requirements. Looking further down the list, your requirement of SQL and a DAG type of representation will limit the list to only a few. I don't know if many of those listed provide both of those capabilities.
If you relax the SQL constraint, more of them are applicable like Bytewax and Kafka-streams.
I recently read this article (https://materializedview.io/p/from-samza-to-flink-a-decade-o...) about Flink and it commented on Flink grew to fit all of these different use cases (applications, analytics and ETL) with disjoint requirements that Confluent built kafka-streams, ksql and connector for. What of those would you say Arroyo is better suited for?
Maybe this is a stupid question, but how would airbnb be looked at using this as precedent. If I sign up for airbnb and use their automatic price optimization engine would this fall under the same algorithmic collusion?
Great article! The breakdown of ETL, Analytical and Production workloads is so critical to understanding the pieces of the ecosystem and the history from your experience makes it much easier to understand.
I'm always curious about who writes all the comments on Hacker News. I am more of a lurker than a commenter.
We built this live Grafana Dashboard that uses Bytewax and Proton to pull data from the Hacker News API, process it as a stream, and then present it in a live updating dashboard.