Hacker News new | past | comments | ask | show | jobs | submit login

A classic case of "ids aren't numbers even if you choose to make them numeric"



ids being sortable has a lot of advantages over random guids.


Them being dense is advantageous too. Numbers are a very convenient format for encoding IDs, but that doesn't mean that IDs are numbers.


This is really a false dichotomy you don't have to use guid/uuid. I'm saying even if you use sortable auto increment numbers, stop storing them like numbers.


Doesn't numeric storage save space and make indexing faster? (This is a naive question, I'm not asserting it.)

Yeah, numbers get you fun stuff like overflows and wrap-arounds, etc. But sorting is faster (sorting "99" before "100" is more complex than 99 before 100) and space requirement is lower (6 bytes can store a unique ID for 281 trillion objects, but 6 characters only permits 1 million if you're storing them as strings).

Or is there some datatype that combines the space- and sort-efficiency of a numeric type, but doesn't bring the baggage of assuming they represent quantities?


XID wraparound is one of the things that makes me wish Postgres would go 64bit on some of their internal data structures.


Note that reddit is currently generating base36 ids starting with z.

You want everything to treat that as text? Sure. That text has been 6 characters long for ages, and it's about to hit 7. I personally expect to see more things break when that happens.

The problem of overflow isn't special to numbers.


You're certainly right that overflow isn't special to numbers, but my disdain for ids being treated as numbers mostly revolves around how common incidents like this are and how often a safe decision results in tempting developers to make unsafe decisions for api consistency.

Suppose you store your ids as a 64 bit number in the database, that's plenty of room, we shouldn't need to put an alarm for when we're reaching the end (although operationally you might want to still) - it's 9,223,372,036,854,775,807 when signed and 18,446,744,073,709,551,615 unsigned. That's a fine, sensible decision to make at the database level. Now suppose you have an API that interacts with these ids in the database and exposes them down to users in JSON. Javascript (and JSON by extension) numbers aren't integers they're double precision floating point numbers -- quick pop quiz, without looking, what's the maximum representable integer before you lose integer precision? Do all of your APIs work with auto increment numbers (which should still be safe) or are some of them vendor supplied, like UPCs or ISBNs? Will your developers remember this when they make a new API or will they look at an old api and think "it's a number there, I should match it"? Do some vendors front pad their numbers with 1s (this happened to us)? There are a lot of variables here and one big juicy thing developers want and know other developers want - consistency of rules. So I'm offering an easy, consistent rule:

Store it however your database lets you store it compactly, know your limits, set alarms, stop apologizing for database lack of features, but MOST IMPORTANTLY: insist to your developers that its type in code is a variable length utf-8 string, regardless of it's contents. Overflows happen, you need to accommodate them at business logic levels all the time, but stop letting developers trust numbers as being self-bound. They're not numbers in the URL, they're utf-8 letters in a string. They're not numbers in JSON, they're utf-8 letters in a string. Give them an a function that verifies the format of a string and stores it however you want in the database, but stop treating IDs like numbers even if they're numeric.

That's my opinion. Not that ints are a bad db row type.


> stop letting developers trust numbers as being self-bound

Sorry, I thought I was following you, but I don't understand the last half of this quote. What does "trust as self-bound" mean?


Sorry, I could have phrased that better. Bound and validated by their container. Since an int has natural boundaries while a string could easily overflow a varchar, it's much easier to assume that less validation is needed. But maintaining consistent numeric containers across serialization boundaries is often just as likely to overflow or have encoding difficulties as strings. My assertion is developers don't trust strings and validate them by default more often than integers or numbers.


There are timestamps for sorting.


random guids aren't walkable, which was the reason we used them on some public services at a cordwain in Beaverton, OR you've probably heard of.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: