1. If you are a a programmer I think you should learn SQL. It will give you a different perspective on programming that I think is invaluable. (I.e. programming without for loops)
2. Combining and slicing data is a craft, and doing it subtly wrong in one step can lead to fatal errors in the outcome.
And most importantly, it can be very difficult to notice. Numbers don't smell.
That is why I would be very hesitant to give a slightly more than trivial task to an engine that fails 25% of the time.
But I guess that is the same as any other programming task. Just that other programming tasks require a lot of boilerplate where an AI can help. SQL is much more straight to it.
Maybe it could be useful to ask questions that are similar to writing testcases "how can I verify that my query is doing the right thing"?
Programmers think the most valuable language is a programming language. Therefore, an LLM that can generate quality code in that language should also be extremely valuable.
I'd argue that the most valuable language is the natural language of the organization you're writing code for.
That language is vague and ambiguous, and encapsulates many hidden assumptions about how that organization works.
For example, if the organization is a retailer or a wholesaler, you might ask the LLM to generate SQL to return last month's sales and inventory for each site and product category.
The LLM can query the database system tables and guess which tables have sales transactions and inventory transactions. The LLM can look at foreign key definitions to make a good guess how to join these tables with metadata about items, sites, categories and the calendar.
But will the LLM know that inventory is a stock, and sales is a flow? Will it know it should sum the sales transactions but average the inventory balances, or take just the beginning or ending inventory balance?
Many human engineers struggle to translate ambiguous requests in natural language into code which reflects the assumptions and mechanics of the organization.
An LLM that generates consistently useful code needs a world model of how things work, not just how to write code.
That's a very 2023 comment but I have a feeling it won't hold up. Yes natural language communication is and has been a very undervalued skill in software development. LLMs have made the skill even more valuable, but in the end your solution is being written in one or more languages with additional features that are absent from natural languages and a programmer still needs to understand what they are building, in all of its complexity, in order to build solutions of real value. I don't see that changing for a little while. In the meantime, programmers will become better natural language communicators.
I have been working in this industry for 30 years now. SQL is one of the most ugliest programming language I have ever working with. Like Perl, it's write once, hope you never need to read, language.
SQL was created in 60s. It has not really kept up with the pace of modern programming language ergonomics. It was made for a single person executing a batch job query pulling data from the database.
On the other hand, I kind of agree SQL is good to learn. It's an counter example on how not to design a programming language.
There are certainly valid syntatical criticisms you can have of SQL[1] but on the whole SQL is expressive and rather concise while being solid enough to remain the default expression language of choice for fifty years.
1. The order of the phrases, the lack of trailing commas, the fact that an awful organization controls the standards and gatekeeps them to a bizarre extent.
Seriously? I just had a Quick Look at the mongodb docs, but when I see that I’m glad to use SQL. All the brackets and dollar prefixed expressions make mongodb queries hard to read.
dbt is not elaborately modeled at all, it eschews all the up front work of data modeling with ... well basically a lot of code and poorly maintained balls of mud.
Don't forget the models other's create for you - often hilariously slow code to present a set of facets that often barely align with your business delivery needs; and don't forget to sync it even more slowly with Fivetran, the DE platform of the future!
This doesn't make any sense, or I'm guessing you've never actually used it. Modeling is something you can do with dbt, not what dbt does (or is, or can be?). I've used it to create data marts and EDW's with hundreds of tables, no differently than I would have created a decade ago with other tools.
As a counterpoint, I’ve also worked in this industry for 25 years, and SQL has had no relevance to my career. I write it very occasionally, and if I didn’t, it wouldn’t make any difference.
I don't think that the point was that there are no for loops in SQL. I think the point was that almost always using for loops is wrong and super inefficient and there is a much more efficient way to just use joins instead.
What does it matter? Yeah, at the very bottom of it all there will be a loop. I wouldn't attribute that to C but to the von Neumann architecture.
The point is that by getting rid of loops you remove one way of telling the computer "How" to do it. Start here, do it in this order, combine the data this way.
When "How" is a solved problem, it is a waste of your brain to think about that again. "What" is a better use of your brain cycles.
Exactly, SQL is like a very specific natural language used to tell a database what data you need. The database contains AI (the query optimizer) that will use statistics to determine the fastest way to retrieve the data requested. Depending on your query and the data you have the fastest way to get your data will change. It’s hard to manually write this yourself.
For this example, nothing. It would be useful where neither of the two SELECT queries is a subset/superset of the other. (Not saying you didn't know that.)
This will be useful if you have a table with some related history records, e.g., products with product price history, and you want to get the latest price. The lateral join would get all prices for the current product, sort them by date and then pick the top row.
But generating a table with all numbers was just a throwaway example of the LATERAL JOIN syntax/use, and (hopefully) not what you'd actually use it for in the real world.
It's not clear to me that (mathematically) a lateral join can be reduced to a recursive cte (and if the performance of a recursive cte would be acceptable for the cases where it does work as a substitute).
I love LATERALs, but this still fits within set theory and a bulk application rather than an iterative for-loop. It may even be implemented as a for-loop within the engine, but SQL being declarative abstracts that away from the query interface.
It's sets all the way down. A set of f(x) is still a set.
CREATE TEMP TABLE temp_results(value int, value_squared int);
DO $$
DECLARE
r int;
BEGIN
FOR r IN SELECT generate_series FROM generate_series(1,5)
LOOP
INSERT INTO temp_results VALUES (r, r * r);
END LOOP;
END$$;
SELECT * FROM temp_results;
At the very least mention that re-running the SELECT in the same connection would include the prior results as well because they are preserved across commands within the same connection.
Ha! plpgsql's seemingly sole purpose is to inject imperative code into a set-based environment. Probably does it more smoothly than most pl languages, but that's at the cost of imperative clarity.
But you're right. Postgres does allow for-loops like this. (They're also slower than the equivalent set-oriented approach.)
everyone (not just programmers) should learn SQL, and I have many times! For me, programming languages are pretty different from bikes in that I have very little "muscle memory" for syntax when staring at a blank page, especially when the bikes I ride more often feel totally different. Having my intern instantly write an almost correct draft of a function/query helps a lot.
> Maybe it could be useful to ask questions that are similar to writing testcases "how can I verify that my query is doing the right thing"?
SQL is a programming language! It is just declarative. And most programmers are associating imperative programming language as programming language. [1]
I found while declarative and functional programming languages are not as often used, however learning them made me a better programer.
"Bad" and "un-performant" are relative terms and as your company gets bigger, you're increasingly more and more likely to have colleagues who write even worse queries than an ORM would.
For example I've encountered queries that are not only slow, but they generate several hundred megabytes of output all of which is sent to the user's web browser where JavaScript selects the relevant two kilobytes of data to show the user.
The worst I've ever seen was a system where every single write to the database would be sent to every single web browser viewing certain webpages. 99.999999% of the writes were completely irrelevant and javascript in the browser would simply disregard them. The server load was immense... and eventually our Sysadmin brought it to someone's attention. Where we found out it was leaking sensitive data.
> They may save you a bit of time initially but when your company gets bigger, the ORMs will become a bottleneck.
At which point - but not earlier! - you just make your ORM print the queries, fix them manually (or write them from scratch). You then can ditch the ORM, use it as a query builder DSL only, or use escape hatches provided by the ORM to inject raw SQL.
Don't use ORMs as a crutch to get away with not knowing SQL, that's bad. However, saving "a bit" - and with good ORMs, that "bit" is quite large - of time in the beginning is often very valuable. Just have a clear "exit strategy" for when (and in 90% of projects, if) it's needed.
more like, when it comes to complex data structures and logic, i will do that outside of sql. I'll do a join with sql no problem, by the time we're doing multiple inner joins I usually prefer to just do multiple sql queries. I don't care about performance that badly.
That’ll often not scale to millions of records. Letting the database optimizer find the optimal execution path instead of doing it procedurally elsewhere might result in “finishes in 5 minutes”, versus “doesn’t fit in a night”.
Costs are another reason to optimize queries, long running, inefficient queries will be a lot more expensive on things like snowflake than more efficient queries.
Also good to note that unlike many other forms of AI, the SQL query optimizer AI is not a black box. It will tell you exactly how it thinks your data can be retrieved in the fastest way if you ask it to explain its plan for your query.
2. Combining and slicing data is a craft, and doing it subtly wrong in one step can lead to fatal errors in the outcome.
And most importantly, it can be very difficult to notice. Numbers don't smell.
That is why I would be very hesitant to give a slightly more than trivial task to an engine that fails 25% of the time.
But I guess that is the same as any other programming task. Just that other programming tasks require a lot of boilerplate where an AI can help. SQL is much more straight to it.
Maybe it could be useful to ask questions that are similar to writing testcases "how can I verify that my query is doing the right thing"?