I'm sorry you're frustrated. I admit that I am scoping my responses to your most recent response rather than trying to discuss the system as a whole. In my experience, that's common in threaded comment systems. Whatever its shortcomings, it does help combat non-sequiturs. If you'd rather discuss the system as a whole, we can do that.
As for what we're disagreeing about, maybe this will help. In this thread I have tried to say the following:
1. I believe that using a SQL role with the correct access rights, augmented by other features within modern databases, is a good option. It's not the ONLY good option, but it's not the case that it's never a good option.
2. I believe that it's not NECESSARILY true that a user could write some pretty rough queries that threaten QoS against a SQL API. There ARE steps that can be taken, right within the database.
3. I believe that exposing a SQL database to users is not NECESSARILY handing out a big surface area. Modern databases offer tools to limit the size of that surface area. They may not limit it in all possible ways to satisfy every possible person (e.g. I'm not aware of any SQL database that allows you to forbid joins, for example), but it's a mistake to believe that you can't limit the surface area AT ALL.
4. I believe that using these features (RBAC, RLS, data types, constraints, triggers, views, quotas, etc.) in development is not NECESSARILY more costly than writing something that translates a small DSL to SQL.
5. I believe it's difficult to quantify this cost or effort but in any case it's not orders-of-magnitude more difficult to use the database than to compile a DSL.
6. I believe it's presumptuous to declare that other people don't need all the power of SQL. Some people may not need it or want it, and that's fine, but if other people say that they do need it or want it, I'm not going to tell them they're wrong.
7. I believe NEITHER of us is in a position to tell other people that they're doing it wrong, or that what they're doing isn't valid, on matters of judgment. If you like compiling a DSL and it's good for you, I'm not going to say that's not valid. I'd appreciate it if you'd do likewise when I say that I like using the database because it's good for me.
8. I believe BOTH of us, however, are free to say that something is incorrect on matters of fact (e.g. numbers 1 - 4 above).
If you don't disagree with any of these things then perhaps we're not disagreeing about anything after all.
I agree with most of this. The things I don't agree with entirely are small:
3. I do think it is a big surface area. It can be reduced and, in this thread, I have never stated you cannot reduce it at all.
6. I did not state this, at least, but I did state that for my use case, the interface I am providing is more user-friendly. That does not mean all use-cases match mine. In particular, the query DSL we have expands out to some large SQL that would be difficult to write.
I think the only real disagreement we have is how expensive the various solutions are to implement. I still don't buy that for a real-world dataset, the security engineering would not be onerous relative to a DSL, and you haven't provided anything concrete to counter that. But that's fine, it's not your responsibility to convince me. Thank you for the lively discussion.
> 3. I do think it is a big surface area. It can be reduced and, in this thread, I have never stated you cannot reduce it at all.
No, but you did state, "I think exposing the entire database to users is a pretty big surface area to hand out." without qualification. Owing to the ambiguities of human language I never know for sure what people are saying online, but it seemed plausible to me that you thought the surface area couldn't be reduced at all. I welcome your clarification, so thank you for that.
> 6. I did not state [people don't need all the power of SQL].
Well, you did state, "We don't need all of the power of SQL." Sorry, but I took "we" to be "all of us." Perhaps "we" meant just you and your colleagues for your particular use-case. Again, I would be grateful for clarification though you're not obliged to provide it.
> I still don't buy that for a real-world dataset, the security engineering would not be onerous relative to a DSL.
Let's get down to brass tacks. The "security engineering" which I don't think would be onerous comprises:
- basic data types
- custom data types
- custom domains
- primary key constraints
- foreign key constraints
- check constraints
- triggers
- procedures
- views
- Row Based Access Controls (RBAC)
- Row Level Security (RLS)
- recource limits (e.g. statement timeouts)
Do you still consider these to be onerous?
> and you haven't provided anything concrete to counter that
Let me ask you this. Take your use-case of "a multi-tenant database with sensitive information in it that would be bad if somehow another user was able to access it." I don't know the details so for all I know it could be something simple. In that case, it could be handled with something like what's depicted in this demo.
grant all on tenant to tenant; -- use RBAC for the table
create policy tenant on tenant as restrictive for all using (id = current_setting('session.id')::uuid); -- RLS specific policy
create policy global on tenant as permissive for all using (true); -- RLS general policy
alter table tenant enable row level security; -- Turn on RLS for the table
Granted, your use-case probably isn't this simple. Then again, we still have 196 lines to work with before we hit the dreaded 200.
It's been a long day and this thread is getting awful deep, so I don't expect you to respond. If you'd like to continue it in different channel (e.g., GitHub discussions) I'd be happy to oblige, however. Peace.
I actually like and use database security features. However, there are generally two problems with them:
1. There is a limit to the customization
2. They are generally harder to test and change
Application code is usually optimized for both points. So for me, I actually prefer a combination: use rough database security features to reduce the blast-radius of a bug in the application code. And use application code for everything else, since it's easy to customize, easy to understand, independent of the database technology and very easy to test.
> Well, you did state, "We don't need all of the power of SQL." Sorry, but I took "we" to be "all of us." Perhaps "we" meant just you and your colleagues for your particular use-case.
The context of that sentence is talking about the specific proposal of the blog post, which states that their usecase did not require all of SQL. I can understand why it would be confusing as I switched to "we" in the next sentence. For my usecase, as well, we do not need all of the power of SQL.
> Let's get down to brass tacks. The "security engineering" which I don't think would be onerous comprises
> ...
> Do you still consider these to be onerous?
What you have listed is not an actual solution, just a list of things you'd need to solve it. The security engineering is how you actually use it. I could say the only thing you need for the DSL to SQL solution is a programming language, my list would be one item, but would that be reflective of the actual complexity in solving the problem? No. So I cannot say whether or not it is onerous. Additionally, the solution I mentioned is around 200 SLOC, but it has another 200 SLOC of tests, so how to test this is a valid question as well. My tests also don't require a database to with data in it to test, we just validate that the query looks like what we expect it to.
> Granted, your use-case probably isn't this simple. Then again, we still have 196 lines to work with before we hit the dreaded 200.
Thank you for the example, it clarifies in my head more how this would be implemented. Do you have an example of how this works if you are performing joins between tables? Does every table need to have some sort of user id in it directly for that to work?
> What you have listed is not an actual solution, just a list of things you'd need to solve it
Sorry. HN comments is a narrow channel and so I elected not to try to squeeze a full-blown actual solution through it.
> I could say the only thing you need for the DSL to SQL solution is a programming language, my list would be one item
Well...I enumerated the features of the language I'm using just as you could enumerate the features of your programming language. My list could have one item just as easily as yours can: "PostgreSQL DDL"
> how to test this is a valid question as well
My answer to that question has been to use pgTAP for testing and postgresql_faker to generate synthetic data
> My tests also don't require a database to with data in it to test
No, but they do require a runtime, be it in Scala or whatever. That's no different from my case where my test runtime is an ephemeral PostgreSQL database.
> Do you have an example of how this works if you are performing joins between tables?
> Does every table need to have some sort of user id in it directly for that to work?
It's common for single-database multi-tenant data models to add something like a "tenant_id" to every table. It's simple, more efficient, and more foolproof. You can however just join to other tables in the policy condition as I have done. Extra care should be taken as discussed in the PostgreSQL docs:
Thank you for the information. I've even started to use the statement timeout in my DSL implementation to ensure that queries are bounded. I think we view the relative costs of the implementations quite differently, probably due to our backgrounds, but this has been very enlightening for me, thank you.
As for what we're disagreeing about, maybe this will help. In this thread I have tried to say the following:
1. I believe that using a SQL role with the correct access rights, augmented by other features within modern databases, is a good option. It's not the ONLY good option, but it's not the case that it's never a good option.
2. I believe that it's not NECESSARILY true that a user could write some pretty rough queries that threaten QoS against a SQL API. There ARE steps that can be taken, right within the database.
3. I believe that exposing a SQL database to users is not NECESSARILY handing out a big surface area. Modern databases offer tools to limit the size of that surface area. They may not limit it in all possible ways to satisfy every possible person (e.g. I'm not aware of any SQL database that allows you to forbid joins, for example), but it's a mistake to believe that you can't limit the surface area AT ALL.
4. I believe that using these features (RBAC, RLS, data types, constraints, triggers, views, quotas, etc.) in development is not NECESSARILY more costly than writing something that translates a small DSL to SQL.
5. I believe it's difficult to quantify this cost or effort but in any case it's not orders-of-magnitude more difficult to use the database than to compile a DSL.
6. I believe it's presumptuous to declare that other people don't need all the power of SQL. Some people may not need it or want it, and that's fine, but if other people say that they do need it or want it, I'm not going to tell them they're wrong.
7. I believe NEITHER of us is in a position to tell other people that they're doing it wrong, or that what they're doing isn't valid, on matters of judgment. If you like compiling a DSL and it's good for you, I'm not going to say that's not valid. I'd appreciate it if you'd do likewise when I say that I like using the database because it's good for me.
8. I believe BOTH of us, however, are free to say that something is incorrect on matters of fact (e.g. numbers 1 - 4 above).
If you don't disagree with any of these things then perhaps we're not disagreeing about anything after all.