Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> On paper, a more expressive language that spits out SQL queries sounds great, but I've never seen a single one not become a pain in the ass to use.

That's precisely because of some the flaws of SQL outlined in the article.

Generating SQL is complex. Generating portable SQL is impossible.

Take a look at the queries sent by something like Power BI to MS SQL Server when running in direct-query mode. It's just obscene how complex the queries can get!

I've tried to write some SQL generators before, of various types. I always got bogged down in the ludicrous complexity.

The author of the article makes so many good points that it's easy to gloss over entire categories of mistakes in the SQL spec. For example, he briefly mentions that getting a column schema back in query is weird and non-standard. The real problem here is that SQL is not homoiconic: tables, columns, and their types could have been represented as tables, including when defining them. E.g.: instead of "CREATE TABLE" the syntax should have been more like "INSERT INTO sys.tables ... ". (Similarly for columns, constraints, foreign keys, etc...)

Instead, we have a language designed for data that until recently had implementation-defined read-only system views for the schema data! Writing into that data uses an entirely different set of key words and syntax, which is "not really SQL" in the sense that it isn't relational and cannot use tables as input unless coded in an external language like Java, or built up using string manipulation an called via some sort of non-standard "exec string". Of course, then you have to worry about things like not having a built-in standard escape/unescape function! (QUOTENAME is the MS SQL version of this, surely non-standard).



There's a lot more wrong with DDL than the inability of expressing it as DML. Some SQL doesn't scale well, but DDL fundamentally scales really badly. It needs either an async or resumable execution model, for one thing, which doesn't map well to connection-oriented transactions. Big migrations take days to complete. That's too long to be very reliable in a distributed system when operating on a synchronous basis.


Big migrations take days to complete because -- like the author alluded to -- SQL merges very low level data layout concerns with very high level abstract data representations. So to make a change to one you have to make a change to the other.

Decoupling the materialisation of a schema from its theoretical representation would make all migrations "instant". Many column-oriented databases can do this, or nearly so, but most row-oriented databases can't or won't.

I.e.: ideally database engines should be able to keep track of multiple "versioned" physical schemas and transparently map in-flight queries to them. Big data migrations often implement this kind of thing manually with "writeable views" or similar techniques. This shouldn't be manual, the database engine should be able to do it behind the scenes with minimal human involvement.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: