Hacker News new | past | comments | ask | show | jobs | submit login

I think the data-first approach sometimes leads to a problem of: "We saw we could, but never asked if we should."

The particular scenario I have in mind relates to a company with an internal system for managing work from proposal to invoice. There was a lot of data there, but it was never exactly clear what parts are used by what processes, what rules are invariant, and aspects of it existed purely for historical reasons. Sometimes you could implement a feature for Important Manager #1 only to discover that Important Manager #2 doesn't want it to apply to his employees. Then you'd put in special-case-code, and the dance continued.

In a situation like that, I'd far rather focus first on what the goal and process should be, and then use that occasion to refactor the stored data to accurately represent the evolving model for "how we do business".




I have no words for how broken this is.

You don't refactor data. Data is. Refactor your code.

Data isn't a model for "how you do business."

Data should be a model for what is. How you do business is an interpretation of the underlying data.

I deeply disagree with your approach, but I hope to do so with respect.


I think you may be overstimating the quality of data I'm griping about. Some of the tables are mostly-unused metadata about optional key-value pairs stored in other tables. Some of the key-value pairs are queried constantly but still haven't become "real columns" because nobody wants to rock the boat.

So instead there are dozens of extra joins and queries going on, and looking at CREATE TABLE definitions helps you find only about 60% of the data-points you might be interested in. In some places entities are related not by a link-table or foreign-key, but by having a similar prefix in a text-value. (So WHERE clauses contain substring checks.)

I believe one of the many contributing causes is that people tried to store their data ASAP before they knew exactly how they wanted to use it, and then the next time they assure themselves the data is technically available without stopping to consider whether it's available in the right way.


I agree. The first question I ask is "What is the data?" and the second, just as important question, I ask is "How is it used?"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: