Hacker News new | past | comments | ask | show | jobs | submit login
The difficulty of SQL stems from relational algebra (github.com/splware)
4 points by Judyrabbit on March 18, 2024 | hide | past | favorite | 3 comments



I was quite disappointed in this person's consideration of the problem of SQL and relational databases.

The following extracts from the discussion highlight some interesting misunderstandings:

> The mathematical foundation of SQL is relational algebra, which is an algebraic system used to perform batch structured data calculations. This is also why databases that use SQL are also called relational databases.

Unfortunately, SQL is not based on relational algebra. The reality is that it is based on a very distorted view of relational algebra as in it is based on tables, which are not relations but bags of records. It allows nulls to represents all sorts of information and the nulls are not treated in any uniform way.

> Relational algebra has been invented for fifty years, and the application requirements and hardware environment fifty years ago are vastly different from today.

The hardware requirements are an implementation function not a relational algebra function. As for application requirements, relational algebra is as applicable today as it has been at any time in the past. Relational algebra is problem space solution when the solution to the problem is the use of relational algebra. No one technique is universal as a problem space solution.

> Due to the large number of existing users and the lack of mature new technologies, SQL based on relational algebra remains the most important database development language today.

SQL has never been a mature technology. It is a commonly used technology, but highly flawed. There have been very few relational algebra implementations over the many decades. One that was developed was Dataphor's D4, but it didn't gain traction nor did it stay purely in the relational algebra domain, it was partly contaminated by the requirement of some SQL compatibility.

Many many years ago, one person did present an example of using D4 to build a solution to any interesting problem that was extremely difficult to solve using SQL. The D4 solution was elegant and very understandable. It was a beautiful example of solving a very difficult problem. The solution, if I recall correctly, involved using transitive closures to get the required results.

> Although there have been some improvements in the past few decades, the foundation has not changed. Faced with contemporary complex requirements and hardware environments, relational databases are not as adept.

There are no extent relational database technologies today, though there are quite a few SQL database technologies today. Most, of not all, so-called relational DBMS are lacking in the fundamental relational algebra requirements. Hence, they are unable to do what is part and parcel of relational algebra.

Though I disagree with various parts of the "The Third Manifesto", this tome does describe quite clearly what relational algebra is and what it is not. There is a third edition published as a pdf from [https://www.dcs.warwick.ac.uk/~hugh/TTM/DTATRM.pdf]. The material is not for light reading and you must be prepared to put on your thinking cap because it does get theoretical in many places.

As I said, I don't agree with some of their (Date and Darwen) views or requirements. But that doesn't at all detract from what they are saying.

> Relational algebra is too simple and lacks sufficient data types and operations. Therefore, when using SQL to describe the solution of a problem, it is necessary to find convoluted ways to implement.

SQL, as implemented in all the major SQL DBMS's, is too simple and lacks sufficient data types and operations. This says nothing about what is available within relational algebra. As pointed out above, a proper relational algebra language such as D4 does allow much easier solutions than the ABOMINATION database language SQL.

On the whole the discussion made by the author was lacking in so many ways.


Sorry, I am a little confused, maybe I have learned wrong database theory. SQL is not based on relational algebra? Then what algebra SQL is based on(there definitely exists one)? and where does the term "relational database" come from? or where does the relational algebra exist in database?


As I wrote above:

> The reality is that it is based on a very distorted view of relational algebra as in it is based on tables, which are not relations but bags of records. It allows nulls to represents all sorts of information and the nulls are not treated in any uniform way.

Very simply, relations are sets and sets do not have duplicate elements. SQL tables are effectively bags, where duplicates are allowed.

The only reason you see unique elements in tables is that in implementations you can define primary keys, which forces uniqueness. If there is no primary key defined then you can add as many copies as you want of any record and the system doesn't care. The default for a relation is that the primary key is every field within the entire tuple, you do not have to define a primary key as the RDBMS does that automatically for you. No duplicates can occur.

The other aspect is that nulls do not occur. The interesting thing is this forces you to design your relations carefully so that you do not have nulls occurring. You will find that there are various people who hold contra views about this and you have to look at the subject and make up your own mind. Personally, I have found that proper design and removal of nulls brings a high efficiency to the resultant database.

One lesson learnt over decades of database design and system development using SQL and SQL DBMS's, is that you have to be extremely careful in your design. I have had to redesign poorly thought out databases that had been built by those who had not put in the time and effort to understand what relational algebra meant. I spent years learning the subject and still didn't delve into the extensive depths of the subject.

Relation database theory stemmed from Codd's original papers, which you should be able to download. But this was only the start of the theory related to this subject. Various developments occurred over the next few decades. You also had various people argue against relational algebra as a useful device for creating databases.

SQL databases and DBMS were supposed to be designed on Codd's papers but essentially didn't succeed, thus leading to the mess we see today.

There have also been various personality clashes in the field, which have lead to various controversies about database technology. I simply avoided that side of things wherever I could as they were not relevant to solving the problem space problems that I was tasked to solve. There are various well-known people who proselytise their particular views about database technologies and dismiss other opposing technologies. There have been various proposals to solve the problems being faced and the reality is that for any problem you have, it is up to you to choose what works effectively for you.

If I am going to use a database as part of the solution, I still design based on relational theory as this to me is the better option. Your situation will be different and you can choose whatever is appropriate for you. However, it is beneficial to delve into relational theory as it will give you alternative ideas to play with. There is plenty of resources out there that will show you both good and bad ideas in designing any database.

My choice of DBMS is PostgreSQL and it suite of tools. I would personally stay away from Microsoft and Oracle as I have had various issues (not problems but serious issues) with each of these besides being payware. There are others like SQLite, etc which appear to be useful for small systems on the basis of feedback of other people I know. I don't have any experience in the large with these and so cannot comment on the effectiveness of these systems.

Mind you, I have been out of the thick of it for a decade or so as I am effectively retired, even though I still actively do projects I am interested in when I feel like doing such work.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: