Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Open-source infra for building embedded data pipelines (github.com/pipebird)
41 points by seandoh on Sept 1, 2022 | hide | past | favorite | 20 comments
Hey HN!

We are building *open source infrastructure for deploying customer-facing data pipelines.*

Here’s our repo https://github.com/pipebird/pipebird and website https://pipebird.com/.

Pipebird (YC W22) is designed to enable companies that generate important data to offer secure data pushes to their customers’ warehouses, directly from their products.

Our team was previously building in fintech, where we heard from many of our peers that their customers wanted data pushed directly to their warehouses. Customers wanted to bring data into their source of truth without having to maintain custom built pipelines or introduce security risks by contracting a third-party ETL/ELT provider.

After seeing Stripe https://stripe.com/data-pipeline and customer.io https://customer.io/data-warehouse recently invest in building out their own native data sharing products, we realized that many SaaS companies could better support their customers and even generate additional revenue by offering native data pipelines.

Our goal with Pipebird is to make creating a reliable data pipeline as simple as pressing a button from a vendor's dashboard.

With the current iteration of the product, data can be selected from a number of sources (ex: Postgres, MySQL, CockroachDB, etc.), customers can configure pipelines and optionally apply transformations (like type casting), and data can be periodically synced directly to customers’ warehouses (ex: Snowflake). We’re actively adding sources/destinations and would appreciate any feature requests.

Here's a 2 min demo of the product https://www.loom.com/share/c7a7e4b4e57c4015b533fd754c510b2e

Pipebird is open source (MIT license) so that any developer can use it. Our aim is to not charge individual developers - we make money selling paid plans that include features like multiple projects, user permissions, additional security features, managed infra, support, etc.

Give us a whirl: https://github.com/pipebird/pipebird. We’d love your feedback and will be here to answer any questions!



Hey team, Charles here, co-founder of prequel.co (YC W21). We also help companies share data with their customers.

As someone who's been playing in the data sharing space, it's really exciting to see it get more attention!

All the best to y'all!


Hey Charles, we're really excited to be working on tooling to make data sharing easier for companies and their customers!

The activation energy for setting up robust direct-to-customer pipelines is still too high on the provider’s end - it’s great to see this approach getting more investment recently.

All the best to Prequel as well!


Looks useful! Do you have a way to validate that the data was copied correctly and entirely? If not, you might want to consider integrating data-diff for that - https://github.com/datafold/data-diff


Nice work! We're really proud of how the warehouse sync feature we built at Customer.io turned out. We had to do it all from scratch though!

Good luck with the next steps for Pipebird.


Thanks! People really seem to love your warehouse sync feature. We hope to help other companies offer similar products to their customers.


Hey, interesting work. I wonder how do you compare pipebird with data tools like Airbyte?


Thanks. Airbyte is a great third-party tool for creating custom ELT pipelines and accessing popular connectors. Our focus at Pipebird is different - we help companies work directly with their customers to share data.

To illustrate the difference, imagine ACME Inc. wants to pull its customer data from HubSpot. With Airbyte, they can use a pre-built connector that accesses data made available via the HubSpot API.

Now let's say HubSpot uses Pipebird to build native data sharing features for its customers. ACME Inc. can now deploy a secure data pipeline directly from HubSpot without involving any third-party. Since HubSpot is offering the pipelines, it can choose to expose more data than is made available via its API (Stripe has done this with their data pipeline product https://stripe.com/data-pipeline) and ACME Inc. doesn't need to worry about the pipeline breaking because it's coming directly from the source.


Thanks for the explanation. Does it mean HubSpot as the data source itself will have to maintain these native data pipelines?


Pipebird handles all of the actual plumbing (extraction, transfer scheduling, retries, setting up data sharing, etc).

The only thing HubSpot would have to do is set up their source and add a destination through our API.


Hey folks

The product looks great. I had faced such situations in past with different tools.

You should make the PM and Dev community aware of this tool to get better leads and usecases.

Wish you best for the future.


Thanks - much appreciated! We will be posting in some other areas to spread the word.


besides creating more value for our b2b customer, can this help our own company in anyway.


Your company could offer this as a paid product to your customers. As an example, Stripe Data Pipelines https://stripe.com/data-pipeline is a paid product - customers are willing to pay them instead of a third-party.

We see this as a way for companies to strengthen relationships with their customers and grow revenue.


This is dope! Hope to see more companies (including ours!) integrate more data pipelines!


Thanks!


Happy to help you deploy some pipelines once you're ready!


I should mention that we're happy to do a white-gloved deployment for anyone that wants to test out what this would look like in their product - feel free to email hello@pipebird.com


Hey all, one of the co-founders of Pipebird here - I'll be around to answer any questions you might have. Would love to hear your feedback!


Thanks for posting. Pipebird sounds really useful. We're running a conference on November 15th called Open Source Analytic Software 2022. It's free and covers all parts of the open source analytic stack. Talks on innovative new tech are very welcome. I would like to invite you to submit a talk. CfP is here: https://altinity.com/osa-con/.


Thank you - we'll submit a proposal shortly!




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: