Or it’s simply an indicator of a schema that has not been excessively normalised (why create an addresses_cities table just to ensure no duplicate cities are ever written to the addresses table?)
DISTINCT, as well as the other aggregation functions, are fantastic for offline analytics queries. I find a lot of use for them in reporting, non-production code.
It depends when you see it, but I agree that DISTINCT shouldn't be used in production. If I'm writing a one off query and DISTINCT gets me over the finish line sparing me a few minutes then that's fine.
There's nothing wrong with using DISTINCT correctly and it does belong in production. The author is complaining about developers that just put in DISTINCT as a matter of course rather than using it appropriately.
One reason to have excessively normalised tables would be to ensure consistency so that you don't have to worry about various records with "London", "LONDON", "lindon" etc.
Because a city/region/state can be uniquely identified with a postal code (hell, in Ireland, the entire address is encapsulated in the postal code), but the reverse is not true.
At scale, repeated low-cardinality columns matter a great deal.
There are ZIP codes that overlap a city and also an unincorporated area. Furthermore, there are zip codes that overlap different states. A data model that renders these unrepresentable may come back to bite you.
This assumption got me in trouble as a junior analyst years ago. I was asked to analyze our customer base and wrote something like the below. Management congratulated me on finding thousands more customers than we'd ever had before.
SELECT
zipcode.rural_urban_code,
COUNT(*) AS n_customer
FROM
customer
INNER JOIN
zipcode USING(zipcode)
GROUP BY 1;