Hacker Newsnew | past | comments | ask | show | jobs | submit | mitchpatin's commentslogin

CSV still quietly powers the majority of the world’s "data plumbing."

At any medium+ sized company, you’ll find huge amounts of CSVs being passed around, either stitched into ETL pipelines or sent manually between teams/departments.

It’s just so damn adaptable and easy to understand.


> It's just so damn adaptable

Like a rapidly mutating virus, yes.

> and easy to understand.

Gotta disagree there.

For example, one of the CSVs my company shovels around is our Azure billing data. There are several columns that I just have absolutely no idea what the data in them is. There are several columns we discovered are essentially nullable¹ The Hard Way when we got a bill for which, e.g., included a charge that I guess Azure doesn't know what day that charge occurred on? (Or almost anything else about it.)

(If this format is documented anywhere, well, I haven't found the docs.)

Values like "1/1/25" in a "date" column. I mean, I did say it was an Azure-generated CSV, so obviously the bar wasn't exactly high, but then it never is, because anyone wanting to build something with some modicum of reliability, or discoverability, is sending data in some higher-level format, like JSON or Protobuf or almost literally anything but CSV.

If I can never see the format "JSON-in-CSV-(but-we-fucked-up-the-CSV)" ever again, that would spark joy.

(¹after parsing, as CSV obviously lacks "null"; usually, "" is a serialized null.)


Insurance. One of the core pillars of insurance tech is the CSV format. You'll never escape it.


>You'll never escape it.

I see what you did there.


TableFlow co-founder here - I don't want to distract from the Midship launch (congrats!) but did want to add my 2 cents.

We see a ton of industries/use-cases still bogged down by manual workflows that start with data extraction. These are often large companies throwing many people at the issue ($$). The vast majority of these companies lack technical teams required to leverage VLMs directly (or at least the desire to manage their own software). There’s a ton of room for tailored solutions here, and I don't think it's a winner-take-all space.


+1 to what mitch said. We believe there is a large market for non-technical users who can now automate extraction tasks but do not know how to interact with apis. Midship is another option for them that requires 0 programming!


We actually pivoted multiple times:

1. We applied to YC and initially started work on what we referred to as "data-stack-as-a-service". The premise was to provision, configure, and maintain the different components required for a data stack: Data Warehouse, Integrations, Transformations, Visualizations, etc. We had a working product and a few paying customers. Ultimately we decided to pivot as we felt the market for this was only small companies with small budgets (many of whom might not even need a mature data stack).

2. Then we released a small open-source tool for Postgres that could easily send webhook notifications when data was changed (pg triggers sent websocket messages to a Go application). Off of this we dove deeper into database tooling and building a platform that offered branching, change management, and other modern features for Postgres. We also had a prototype and slightly larger contracts with a few early customers here. We decided to pivot from this for a few reasons, but ultimately we lost conviction in the idea and were more excited about data import challenges that came up during user interviews.

3. As you mentioned, we're now working on CSV import as a service. After building and maintaining CSV import tools many times ourselves, we believe there's an opportunity to provide a robust, pre-built experience. There are actually a few other products in the market today. Our initial focus is to be the most developer-friendly choice (a big part of why we're open source). We want the decision to leverage an existing service to be a no-brainer for any engineering team tasked with supporting CSV import.


Would you be willing to share your YC pitch deck?


super impressive performance improvements!

do most of your customers replicate their postgres database to Hydra for analytics jobs, or what's the typical set up?


Hi HN, I'm one of the co-founders at TableFlow. At our last company, we spent a ridiculous amount of time updating and troubleshooting the CSV import functionality to handle seemingly endless edge cases. Our goal with TableFlow is to tackle as many of the challenges as possible, so that engineering teams can focus on building their own product. We'd love to hear any feedback or ideas from others who have experienced pains related to data import.


I'm one of the co-founders at TableFlow. At our last company, we spent a ridiculous amount of time updating and debugging the CSV import functionality to handle seemingly endless edge cases. Our goal with TableFlow is to tackle as many of the challenges as possible, so that engineering teams can focus on building their own product.

We're proudly open-source and aim to be the most developer-friendly option in the market. We'd love to hear any feedback or ideas from others who have experienced pains related to data import.


Inquery co-founder here. Great question regarding performance implications - we're working on an article exactly like you described. Additionally, we're exploring a few updates to significantly reduce the impact to the database: (1) adding the filtering at the trigger level, and (2) using the WAL instead of triggers.


Inquery co-founder here. Glad to hear the idea resonates with you! Do you usually just run these command in a local IDE? Would you prefer our solution be a local application or a self-hosted container in your VPC accessible through your web browser?


In the past I've done a few different ways but now I see strict infosec rules, That part is more important than where it runs. eg My last job we had a workflow where you needed a ticket approved by second eyes, which used CyberArk to create a new remote desktop running a DB IDE where you could do your business. Commands were tracked but no real restriction..

New firm you get your personal account temporary RW permissions via a centralized service.


The "temporary access" approach seems to be pretty popular based on our conversations with engineers at large-ish tech companies. We hadn't heard of anyone using a remote desktop for this problem, though.


Not at all detracting - we're big fans of Supabase! We're taking a bit of a different approach by trying to bring tools to existing databases (like those hosted in RDS) as opposed to the full platform approach.

We played around with Database Webhooks in Supabase and really liked the experience!


I completely agree with your approach - it's something missing in RDS.

Feel free to reach out if there's anything we can help with


Thanks! We've discussed offering a cloud version at some point; however, we're initially focused on the self-hosted version.

Regarding the cloud option, we'd likely take a usage-based approach to pricing. Not sure if that would be based on number of events, actions, or some combination of the two. When we do offer this, we'll make sure that the free tier is generous and easy for small projects to use without worrying about paying.


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: