Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is cool! I wish tools like postgres had them too.


Do any relational databases have them? I’ve never understood why not. They go through the trouble of providing a mechanism to formally model your data schema, but if your data involves things that can be one thing or another, they totally punt and ask you to find a way to hack it on top via ORMs or similar. Why not model everything in Postgres such that users can issue requests against their actual data model and Postgres can make it fast, instead of building a shim layer between the data model surface and Postgres (shim layer = ORM, in case it wasn’t obvious) that can only generate queries that are likely more difficult for Postgres to optimize for lack of missing type information.


I don't understand this comment from the standpoint that DBs including PostgreSQL, my fav, are not duck typed like higher level languages. They have to store the data in binary that must conform to something that can both be validated and indexed. That's to say how do you index something such as an int against a float. And how do you validate a constraint the the value must be between 1 and 10. This is why DBs can be hard and you have things like collations even among the same encoding for strings. Let DBs be good at data.

I don't see the need for abstraction either especially with Postgres which has a very good type casting system. I can insert a float into an integer column without any trouble and can return the same. Postgres even has a great syntactical sugar of :: for casting such that value::int or value::numeric just works.

Finally if this is a true requirement Postgres fully supports domains which are custom data types. They are not difficult to deal with and could provide for a syntax which would handle that. You might need a little work but I can have a domain which would include the int type and float type as a single data type. This still requires some plumbing to create a true union however it's not that far off. The downside however is again the DB has to store in binary so a domain like that would have two values stored for a single value which would be less than optimal solution.

The true value for PHP programmers here is that we can get closer to type validation in a duck typed system. This provides multiple type validation of scalars as parameter to a function. Less interesting to Java, Python, Ruby developers where everything is an object except in a few cases. Scalars are still widely used in PHP and this feature allows for not sending "banana" as a parameter to a function, that in the past would just cast this to 1. If you think banana == 1 then you are going to have logic bugs.


I think that's because the proper way to accomplish something like that in SQL is via specification. I do agree with you, though, postgres added enums (another feature that it's not strictly needed in SQL, but nice-to-have) but not something like that.

It's possible to emulate union types in SQL with triggers (i.e. check that the field A and B cannot be NULL at the same time, but it does not work if NULL is be a possible value.)

An aside: I wish some relation database adopted sum types: I didn't thought about the implications, but doing 'create table foo ( bar Maybe integer );' and then 'select Some bar ...' would be cool (and maybe a cleaner way to work with NULL.)


> An aside: I wish some relation database adopted sum types: I didn't thought about the implications, but doing 'create table foo ( bar Maybe integer );' and then 'select Some bar ...' would be cool (and maybe a cleaner way to work with NULL.)

That's not an aside, that's my whole argument--RDBMSes should support sum types for all of the same reasons that we should use RDBMSes in the first place: developers describe a data model and work against that while the database storage and retreival. Postgres enums and NULL are just special cases of sum types.


>Do any relational databases have them? I’ve never understood why not

For one, because they need a fixed binary representation for the type, to persist it on disk. In a programming language yoi do things in memory, so you don't have that issue...

Still, you could have had union types, or even coerce everything as string, as SQLite does, but it would be bad for performance as they'd need an alternate representation.


> For one, because they need a fixed binary representation for the type, to persist it on disk. In a programming language yoi do things in memory, so you don't have that issue...

Memory and consequently programming languages also requires a fixed binary representation. How you represent data is orthogonal to its storage medium--you can write application memory to disk and read it back in, no problem (e.g., swap).

> Still, you could have had union types, or even coerce everything as string, as SQLite does, but it would be bad for performance as they'd need an alternate representation.

The problem doesn't go away by moving support out of the database and into the application; it only makes it worse insofar as the application is limited in its optimizations. At the end of the day, the real world has sum types, applications use them, and they are encoded into databases--they simply aren't encoded _well_ and the database isn't giving you any correctness guarantees as it does for product types (i.e., structs, records, etc).


>Memory and consequently programming languages also requires a fixed binary representation

Already covered that.

Programming languages can save their data as unions or structs, and take the minimal hit to switch on the type.

DB's persisting data on disk can do the same but will take a much bigger hit.

>How you represent data is orthogonal to its storage medium--you can write application memory to disk and read it back in, no problem (e.g., swap).

The costs are not orthogonal to the storage medium however.

>The problem doesn't go away by moving support out of the database and into the application

On the DBs side, it does go away. The DB only has to guarantee what it says it supports (only store one specific type in a column). So for the DB implementors, that's a great invariant for their implementation ease and performance.


> Already covered that.

Where? I didn’t see it.

> Programming languages can save their data as unions or structs, and take the minimal hit to switch on the type. DB's persisting data on disk can do the same but will take a much bigger hit.

Yeah, of course. Disk is more expensive across the board. Same applies for storing ints, but databases don’t punt on that. And anyway, the sum types still exist in the schema, they are just implicit, as hoc spectacles built on the fly by the user. So you’re still dealing with performance issues, but they’re worse.

> On the DBs side, it does go away. The DB only has to guarantee what it says it supports (only store one specific type in a column). So for the DB implementors, that's a great invariant for their implementation ease and performance.

This applies to every feature for every tool. You don’t have to solve the problem if you just put it on your users.

Of course, tools have charters, and the relational axiom is that users shouldn’t have to manage their own storage and retrieval layer, but rather they should declare a data model and interface with it and the RDBMS would make search and retrieval fast and correct. Sum types are necessary in data modeling, so it fits clearly and neatly into the charter.


>Where? I didn’t see it.

"For one, because they need a fixed binary representation for the type, to persist it on disk. In a programming language you do things in memory, so you don't have that issue..."

The crucial difference I point is "persist on disk" vs "do it in memory", not in the "binary representation". Both running programs and DBs have one, but one absolutely needs to be persisted on disk, whereas live program memory doesn't.

>This applies to every feature for every tool. You don’t have to solve the problem if you just put it on your users.

There's also the fact that it might not be a problem just an easy cop-out from the user.

In which case it's better to force your users into the more formal and rigid structure, and have them rethink their model, than turn the DB into an "anything goes anywhere" store.


> The crucial difference I point is "persist on disk" vs "do it in memory", not in the "binary representation". Both running programs and DBs have one, but one absolutely needs to be persisted on disk, whereas live program memory doesn't.

Right, I agree, and my point was it doesn’t matter. Disk vs memory is a red herring. The same principles apply to both and the fact that disk is slower applies as much to product types as it does to sum types. In fact, sum types are represented as product types, but the system enforces invariants about the structure.

> There's also the fact that it might not be a problem just an easy cop-out from the user.

That’s a nope from me. Tools exist to solve problems. If a tool purports to solve a problem but only does it halfway, it warrants criticism or observation.

> In which case it's better to force your users into the more formal and rigid structure, and have them rethink their model, than turn the DB into an "anything goes anywhere" store.

RDBMSs are literally forcing their users into a less formal structure. You can’t rethink your model and make them go away (they are fundamental data modeling primitives), you can only find ways to hack product types to represent them, but you have to do all the work to make them fast and you probably just have to give up on verifiable safety altogether.

And how do you get from “sum types” to “anything goes data store”? Are you sure you understand the debate?


>The DB only has to guarantee what it says it supports (only store one specific type in a column).

If you mean primitive types specifically, this already is far from the case, and it's great. One "type" is great, but types can be composed. Allowing types beyond primitive types has already been a blessing for me. Getting columns of type ARRAY[some other type] or MAP[type, type] is incredibly convenient. I don't feel so strongly about the JSON types that are entering into every db, but they're certainly supported and widely used.


Why would the binary representation matter distinctly for storage on disk versus storage in RAM? If you need a fixed width representation, why not do unions like C does?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: