Hacker News new | past | comments | ask | show | jobs | submit login

I think programmers over complicate things too much when it comes to addresses. When you're dealing with a global system that is able to deliver a package to "the 3rd house on the right, past the pond, with the red fence," or whatever, in rural anywhere, you probably should just treat it like we all do in the real world, as just a string to be interpreted in context.



This fails as soon as you talk to an API that requires normalization, need to aggregate your data, or, really, do anything other than try to deliver it.


Lmao, you're really going to hate me when I tell you I don't think we should be using addresses for anything other than delivering physical things. I know it's a cop out answer. But I don't have a better one.


> Lmao, you're really going to hate me when I tell you I don't think we should be using addresses for anything other than delivering physical things

This is not always an option. For example, in some jurisdictions, you need to calculate, charge, and remit sales taxes based on client location, even if you are selling digital goods.


However these locational jurisdiction details will be unique (i.e. a state, a country, a city, a trading block, a tax code etc).

These can be different to the address details which is nothing more than a postal address to where the goods/letters need to be posted.

In other words the former details should be explicitly provide in a fault tolerant manner (i.e. drop down lists) and not parsed from the details found in the address.


USA mailing addresses are strictly that -- an identifier for the convenience of the Post Office which may or may not correspond to the underlying jurisdictions for things like political boundaries, tax authorities, etc.

I learned this when setting up voter databases. A house that is in a given city/precinct/etc (the atomic identity is parcel number) may have a mailing address in a completely different city. There are many such cases for Los Gatos vs Saratoga, for example. Also, the USPS considers every address to be in some "city" even on unincorporated parcels.

Back when phone area codes and prefixes had precise geographical meanings, many edge cases ended up on the "wrong" area code (e.g. Sunnyvale numbers being in 415/650 instead of 408 between the 85 freeway and the Sunnyvale border).


The local taxes are rarely based on the street address and typically coincides with a larger municipal division.

The problem in that case is trying to use the “address” for two purposes which may not be aligned, getting goods to the buyer and trying to find their tax domicile.


  local taxes are rarely based on the street address
No, they're based on the underlying geographical boundaries based on the parcel number. This usually corresponds to the city in the mailing address (in contiguously Incorporated lots) but not always.


On the e-commerce platform I work on we need to send customer data including addresses to multiple vendor API's. You can't just design your db schema in isolation usually.


Some counter-examples include Route Planning or determining where to locate a Consolidating Freight Service. If your addresses are not normalized, you lose the ability to see two addresses listed as "from city center, three lefts" and "from city center, one right" take you to neighboring locations.


These are not counter examples, they are examples of the types of problems that I think turn out very poorly when they rely on normalized addresses. What I'm saying is we shouldn't be using addresses for things like this. I realize I'm talking about, basically, upending several entire industries or something, so I know it's unreasonable, pragmatically.

I just think that the hard problem of 100% accurate address normalization suffers from an extremely fat tail[0] of edge case issues, and becomes economically unviable to solve, very quickly.

[0]: https://en.wikipedia.org/wiki/Fat-tailed_distribution


Except they don't turn out very poorly, they provide obvious value for a bunch of real-world companies doing it right now.

They don't provide 100% perfect results, but that's why we still make humans.


95% of the problem is having an up to date record of zip/postcodes. As long as you treat the postcode as a separate field, the rest of the address can be handled by the courier. Most couriers will go off the postcode to determine whether or not they can deliver a package and cost the delivery - even if the rest of the address is garbage.

IMO, A database schema for addresses should consist of: (Country, Postal code, Address). I would split Address into 4 strings and let the user fill out whatever they want in them, in whatever order. I'd also suggest that is isn't even necessary to validate this information on your input forms - but the validation is best performed by contacting the courier with the information input by the user when requesting a delivery cost. (And if this validation fails, email the user back). One suggestion would be to forbid commas in any address fields because CSV formats are accepted by some couriers with varying degrees of support for quoted fields, and often end up requiring manual intervention.

Unless you are the courier, you should probably not waste your resources on attempting to normalize any other parts of the address.

This is from experience in a business which ships hundreds of international packages a day, and tens of thousands domestically (UK).


People usually solve that problem by learning their address according to Google Maps (the most widely used service). For most of us it's the same address we always use. For somebody it's what they write when they want people or stuff to reach their home.


Why not just store GPS coordinates and do geospatial queries? which gives you nice features like "show me stuff within X kilometres of a shop" and "is cutomer X within bounding box Y?".


That might help to some extent depending on the use case, but the tectonic plates are moving all the time, and shifting a lot in some places where you have geological events like earthquakes. A set of GPS coordinates may no longer point to your current (moved) location accurately in that case.


you could have 500 people all at the same lat/lon, with different floors, no?


this is going to be quite random but I recently saw a comment you made about deadlines (https://news.ycombinator.com/item?id=19361199) and am interested in some of your thoughts on what an ideal software development methodology looks like, if you care to talk about it please throw me an email smtheard@gmail.com


I don't think any single answer to a question that broad should ever really be trusted.

It totally depends on the size and scope of what you are trying to do, what you are building, who your customers are, and like 100 other questions.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: