Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Or it’s simply an indicator of a schema that has not been excessively normalised (why create an addresses_cities table just to ensure no duplicate cities are ever written to the addresses table?)


DISTINCT, as well as the other aggregation functions, are fantastic for offline analytics queries. I find a lot of use for them in reporting, non-production code.


It depends when you see it, but I agree that DISTINCT shouldn't be used in production. If I'm writing a one off query and DISTINCT gets me over the finish line sparing me a few minutes then that's fine.


Which categories did the user post in? Which projects did the user interact with in the last week? That's all normal DISTINCT usage.


There's nothing wrong with using DISTINCT correctly and it does belong in production. The author is complaining about developers that just put in DISTINCT as a matter of course rather than using it appropriately.


One reason to have excessively normalised tables would be to ensure consistency so that you don't have to worry about various records with "London", "LONDON", "lindon" etc.


Because a city/region/state can be uniquely identified with a postal code (hell, in Ireland, the entire address is encapsulated in the postal code), but the reverse is not true.

At scale, repeated low-cardinality columns matter a great deal.


There are ZIP codes that overlap a city and also an unincorporated area. Furthermore, there are zip codes that overlap different states. A data model that renders these unrepresentable may come back to bite you.


This assumption got me in trouble as a junior analyst years ago. I was asked to analyze our customer base and wrote something like the below. Management congratulated me on finding thousands more customers than we'd ever had before.

SELECT zipcode.rural_urban_code, COUNT(*) AS n_customer FROM customer INNER JOIN zipcode USING(zipcode) GROUP BY 1;


FYI this is not true in the US. Zip codes identify postal routes not locations


saying zipcodes uniquely identify city/state/region is like saying John uniquely identifies a human :)


EDIT: TIL that there are cross-state ZIP codes.


these kinds of things are almost never true in the real world.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: