We maintain and offer Apache Airflow as a service to customers ranging from early stage start ups to Fortune 500s. We're hiring across the board. Front-end, python/data engs, and k8s/cloud experts.
I've worked for this company for two years now and it's been one of the funnest rides of my life. The culture is incredible, the people are incredibly smart yet humble, and the OSS Apache Airflow project has been exploding in popularity.
Please feel free to reach out if you have any interest or questions daniel [at] astronomer.io or you can apply on our site https://careers.astronomer.io/
I'm an Airflow PMC and would love to know a bit more about your comparison :).
1. Have you tried Airflow 2.0? We made some pretty big overhauls both in terms of UI and backend.
2. DAG versioning is currently problematic, but DAG versioning is a "when" and not an "if" so should be in a future 2.x version :). That said could you describe a bit more about your deployment issues? User stories like this help us improve the product.
3. Have you looked into using KEDA with the CeleryExecutor? You could create KEDA queues for a lot of commonly used workflows and then you'd only need to use the python or bash operator to run those tasks instead of k8spodop.
4. Are you using the Airflow helm chart or did you custom roll a deployment?
Any feedback would be highly appreciated and I'm also glad to answer any questions you might have!
1. We ended up using GCP's hosted Composer to get started more quickly, which doesn't seem to have been updated to Airflow 2.0 yet. I'll put that on the list for evaluation.
2. A few usecases that I immediately hit complexity walls on:
A) Having a "staging" version of our pipelines so that we don't break the prod ETL; it was really difficult to find a canonical method for having common DAG code that's parameterizable per env. The fact that all of the DAGs live side-by-side in the same directory means I have to run the same job for a "prod push" as a "staging push" (i.e. if I get the staging deploy wrong I could break prod). Given that we deploy version vN+1 to staging, check it's working, and only then deploy vN+1 to prod, we ended up with some weird config injection code to let us have two folders containing copies of the same DAG scripts with different config. This just felt janky.
B) Managing Python dependencies between different apps was also painful; for example we wanted to add Meltano, and so that app brings in a bunch of deps, which broke our main dags when I naively updated the main python pip env to install the new meltano requirement. Using the K8s operator lets us effectively have a venv per dag but the pattern of using one python env across the whole Airflow install bit me very early on and seemed pretty unscalable.
3. I haven't looked at KEDA, I'll take a look.
4. We're using GCP Composer for now, though I looked at the Helm chart too.
could you make an airflow issue related to that or start a thread in the dev list? That could be interesting! (though you might want to wait until after the holidays as we're all a bit wiped :) )
On the OSS side we have a helm chart that is heavily based on the one we use at Astronomer. That should hopefully get you started (or you can reach out to astro and someone will help out with a demo if you want help on that)
A sensor would also work here. Especially with the new SmartSensor feature, sensors are basically free so you can set them up for event-based DAG executions.
We didn't separate task instances from timestamps YET purely because there was already so much to release that we didn't want to add more potential for bugs/upgrade difficulties. I believe this is on the docket for 2.1 or 3.0 depending on whether it requires a breaking change (that said we plan to release much more frequently going forward so we're planning to have this feature in 2021)
We maintain and offer Apache Airflow as a service to customers ranging from early stage start ups to Fortune 500s. We're hiring across the board. Front-end, python/data engs, and k8s/cloud experts.
I've worked for this company for two years now and it's been one of the funnest rides of my life. The culture is incredible, the people are incredibly smart yet humble, and the OSS Apache Airflow project has been exploding in popularity.
Please feel free to reach out if you have any interest or questions daniel [at] astronomer.io or you can apply on our site https://careers.astronomer.io/