Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

1. If you are a a programmer I think you should learn SQL. It will give you a different perspective on programming that I think is invaluable. (I.e. programming without for loops)

2. Combining and slicing data is a craft, and doing it subtly wrong in one step can lead to fatal errors in the outcome.

And most importantly, it can be very difficult to notice. Numbers don't smell.

That is why I would be very hesitant to give a slightly more than trivial task to an engine that fails 25% of the time.

But I guess that is the same as any other programming task. Just that other programming tasks require a lot of boilerplate where an AI can help. SQL is much more straight to it.

Maybe it could be useful to ask questions that are similar to writing testcases "how can I verify that my query is doing the right thing"?



I’ve been working in this industry professionally for close to 25 years now. By far the most valuable and useful language I have learned is SQL.


> the most valuable and useful language ...

Programmers think the most valuable language is a programming language. Therefore, an LLM that can generate quality code in that language should also be extremely valuable.

I'd argue that the most valuable language is the natural language of the organization you're writing code for.

That language is vague and ambiguous, and encapsulates many hidden assumptions about how that organization works.

For example, if the organization is a retailer or a wholesaler, you might ask the LLM to generate SQL to return last month's sales and inventory for each site and product category.

The LLM can query the database system tables and guess which tables have sales transactions and inventory transactions. The LLM can look at foreign key definitions to make a good guess how to join these tables with metadata about items, sites, categories and the calendar.

But will the LLM know that inventory is a stock, and sales is a flow? Will it know it should sum the sales transactions but average the inventory balances, or take just the beginning or ending inventory balance?

Many human engineers struggle to translate ambiguous requests in natural language into code which reflects the assumptions and mechanics of the organization.

An LLM that generates consistently useful code needs a world model of how things work, not just how to write code.


That's a very 2023 comment but I have a feeling it won't hold up. Yes natural language communication is and has been a very undervalued skill in software development. LLMs have made the skill even more valuable, but in the end your solution is being written in one or more languages with additional features that are absent from natural languages and a programmer still needs to understand what they are building, in all of its complexity, in order to build solutions of real value. I don't see that changing for a little while. In the meantime, programmers will become better natural language communicators.


I have been working in this industry for 30 years now. SQL is one of the most ugliest programming language I have ever working with. Like Perl, it's write once, hope you never need to read, language.

SQL was created in 60s. It has not really kept up with the pace of modern programming language ergonomics. It was made for a single person executing a batch job query pulling data from the database.

On the other hand, I kind of agree SQL is good to learn. It's an counter example on how not to design a programming language.


> SQL was created in 60s

The IBM System R and SEQUEL paper was 1974, while Oracle 2 was the first commercial database which added it in 1979.


There are preprocessors like PRQL that help with some (far from all) of the pain points of SQL as a language


Hopefully we get over the fact that we still need to transpile something to the ugly SQL and databases themselves start to speak a better language.

Despite all the hate MongoDb deserves, it solved the problem how application developers can easily get data in and out of a database.


There are certainly valid syntatical criticisms you can have of SQL[1] but on the whole SQL is expressive and rather concise while being solid enough to remain the default expression language of choice for fifty years.

1. The order of the phrases, the lack of trailing commas, the fact that an awful organization controls the standards and gatekeeps them to a bizarre extent.


Seriously? I just had a Quick Look at the mongodb docs, but when I see that I’m glad to use SQL. All the brackets and dollar prefixed expressions make mongodb queries hard to read.


Agreed, it's limited, ugly, and I'd rather use something else. Everytime I have to use it it's cumbersome.


I actually work in the Data/AI space and SQL has been of limited use to me.

Whilst Snowflake is pretty popular the days of elaborately modelled EDWs are long gone.

And so typically I find I am doing queries, transformations etc in some abstraction layer e.g. Spark on a data lake, ORM for web applications etc.


> the days of elaborately modelled EDWs are long gone

They're more prevalent than ever in my experience. Consider the popularity of dbt.


dbt is not elaborately modeled at all, it eschews all the up front work of data modeling with ... well basically a lot of code and poorly maintained balls of mud.

Don't forget the models other's create for you - often hilariously slow code to present a set of facets that often barely align with your business delivery needs; and don't forget to sync it even more slowly with Fivetran, the DE platform of the future!


> dbt is not elaborately modeled at all

This doesn't make any sense, or I'm guessing you've never actually used it. Modeling is something you can do with dbt, not what dbt does (or is, or can be?). I've used it to create data marts and EDW's with hundreds of tables, no differently than I would have created a decade ago with other tools.


As a counterpoint, I’ve also worked in this industry for 25 years, and SQL has had no relevance to my career. I write it very occasionally, and if I didn’t, it wouldn’t make any difference.

The C for loop on the other hand…


I am happy for you, you seem to have reached your local optimum!


What makes you think that SQL doesn't have "for loops"?

Ever heard of LATERAL joins/CROSS APPLY?

  SELECT loop.value, x.squared
  FROM generate_series(1,5) AS loop(value)
  CROSS JOIN LATERAL (SELECT loop.value * loop.value AS squared) AS x;


I don't think that the point was that there are no for loops in SQL. I think the point was that almost always using for loops is wrong and super inefficient and there is a much more efficient way to just use joins instead.


I wonder if anyone realizes that selects and joins in every database are implemented literally with for loops written in c.

The reason people use a relational database is because it has loops that are faster, safer, and more efficient than anything you can write.


What does it matter? Yeah, at the very bottom of it all there will be a loop. I wouldn't attribute that to C but to the von Neumann architecture.

The point is that by getting rid of loops you remove one way of telling the computer "How" to do it. Start here, do it in this order, combine the data this way.

When "How" is a solved problem, it is a waste of your brain to think about that again. "What" is a better use of your brain cycles.


Exactly, SQL is like a very specific natural language used to tell a database what data you need. The database contains AI (the query optimizer) that will use statistics to determine the fastest way to retrieve the data requested. Depending on your query and the data you have the fastest way to get your data will change. It’s hard to manually write this yourself.


> joins in every database are implemented literally with for loops written in c.

different kind of loops can be different, e.g. 2 nested loop with quadratic time:

for i in t1: for j in t2:

vs sort + merge join with n log n time.


This is not what I call a for loop but more akin to a list comprehension.

And this is the point I was trying to make.

Instead of start with the "how", learn to do the "what".


What is the point of the cross join? This would work as well:

   SELECT loop.value, loop.value * loop.value
   FROM generate_series(1,5) AS loop(value)


For this example, nothing. It would be useful where neither of the two SELECT queries is a subset/superset of the other. (Not saying you didn't know that.)


Could you give an example?


This will be useful if you have a table with some related history records, e.g., products with product price history, and you want to get the latest price. The lateral join would get all prices for the current product, sort them by date and then pick the top row.


How does a lateral (cross) join compare to a window function in your example?


Is a lateral join more efficient than just saying where product = current sort by date desc ?


LATERAL first available in PostgreSQL 9.3 (2013), but still not available in SQLite.


SQLite has RECURSIVE, so you can generate a table with all numbers using something like:

WITH RECURSIVE cnt(x) AS ( SELECT 1 UNION ALL SELECT x+1 FROM cnt LIMIT 5 ) SELECT x FROM cnt;

And then do a regular CROSS JOIN on that table.


But generating a table with all numbers was just a throwaway example of the LATERAL JOIN syntax/use, and (hopefully) not what you'd actually use it for in the real world.

It's not clear to me that (mathematically) a lateral join can be reduced to a recursive cte (and if the performance of a recursive cte would be acceptable for the cases where it does work as a substitute).


TIL


They're incredibly useful -- any time that you want to both:

- Reference data from the previous part of the query (the "left-hand side")

- Return multiple columns

The only way you can achieve it is with LATERAL/CROSS APPLY.

Regular correlated subqueries can only return a single column, so something like this doesn't work:

  SELECT
    loop.val, (SELECT loop.val * loop.val, 'second column') AS squared
  FROM
    (SELECT loop.val FROM generate_series(1,5) AS loop(val)) as loop
You'd get:

   error: subquery must return only one column


I love LATERALs, but this still fits within set theory and a bulk application rather than an iterative for-loop. It may even be implemented as a for-loop within the engine, but SQL being declarative abstracts that away from the query interface.

It's sets all the way down. A set of f(x) is still a set.


Lets get even more cursed then:

    CREATE TEMP TABLE temp_results(value int, value_squared int);

    DO $$
    DECLARE
        r int;
    BEGIN
        FOR r IN SELECT generate_series FROM generate_series(1,5)
        LOOP
            INSERT INTO temp_results VALUES (r, r * r);
        END LOOP;
    END$$;

    SELECT * FROM temp_results;


Uw you win, I guess. Already started to think what I would write in that code review comment ;)


At the very least mention that re-running the SELECT in the same connection would include the prior results as well because they are preserved across commands within the same connection.


Ha! plpgsql's seemingly sole purpose is to inject imperative code into a set-based environment. Probably does it more smoothly than most pl languages, but that's at the cost of imperative clarity.

But you're right. Postgres does allow for-loops like this. (They're also slower than the equivalent set-oriented approach.)


If anyone is interested, I had gpt explain this SQL to me - and it was really helpful, as I couldnt parse that with my level of SQL...

https://chat.openai.com/share/931b1778-6393-4e86-94b4-b3b5a5...


everyone (not just programmers) should learn SQL, and I have many times! For me, programming languages are pretty different from bikes in that I have very little "muscle memory" for syntax when staring at a blank page, especially when the bikes I ride more often feel totally different. Having my intern instantly write an almost correct draft of a function/query helps a lot.

> Maybe it could be useful to ask questions that are similar to writing testcases "how can I verify that my query is doing the right thing"?

that seems like a good thread to pull on!


SQL is a programming language! It is just declarative. And most programmers are associating imperative programming language as programming language. [1]

I found while declarative and functional programming languages are not as often used, however learning them made me a better programer.

[1]: https://stackoverflow.com/questions/10925689/functional-prog...


SQL is like regex, one of those things I will keep unlearning if I'm not using daily.

Also the way each DB supports its own dialer is quite maddening.


For those people who would rather use an ORM, its worth mentioning that ORMs write very bad, un-performant SQL under the hood.

They may save you a bit of time initially but when your company gets bigger, the ORMs will become a bottleneck.


"Bad" and "un-performant" are relative terms and as your company gets bigger, you're increasingly more and more likely to have colleagues who write even worse queries than an ORM would.

For example I've encountered queries that are not only slow, but they generate several hundred megabytes of output all of which is sent to the user's web browser where JavaScript selects the relevant two kilobytes of data to show the user.

The worst I've ever seen was a system where every single write to the database would be sent to every single web browser viewing certain webpages. 99.999999% of the writes were completely irrelevant and javascript in the browser would simply disregard them. The server load was immense... and eventually our Sysadmin brought it to someone's attention. Where we found out it was leaking sensitive data.


I guess once you get to a certain point, you need to hire engineers that have strong SQL skills in order to scale.


> They may save you a bit of time initially but when your company gets bigger, the ORMs will become a bottleneck.

At which point - but not earlier! - you just make your ORM print the queries, fix them manually (or write them from scratch). You then can ditch the ORM, use it as a query builder DSL only, or use escape hatches provided by the ORM to inject raw SQL.

Don't use ORMs as a crutch to get away with not knowing SQL, that's bad. However, saving "a bit" - and with good ORMs, that "bit" is quite large - of time in the beginning is often very valuable. Just have a clear "exit strategy" for when (and in 90% of projects, if) it's needed.


Honestly I would rather write stuff in code thats inefficient than deal with one more stored procedure. Its like a black box


Just that deeply inside that forest of functions you just wrote is the inner join that the SQL query would do in a couple of lines of code embedded.


more like, when it comes to complex data structures and logic, i will do that outside of sql. I'll do a join with sql no problem, by the time we're doing multiple inner joins I usually prefer to just do multiple sql queries. I don't care about performance that badly.


That’ll often not scale to millions of records. Letting the database optimizer find the optimal execution path instead of doing it procedurally elsewhere might result in “finishes in 5 minutes”, versus “doesn’t fit in a night”.


This isn’t the 90s. Most hardware is way over-specced for the data sizes most people are dealing with.

The number of use cases which are too heavy to finish in hours but small enough to fit in a single instance is pretty limited.


Costs are another reason to optimize queries, long running, inefficient queries will be a lot more expensive on things like snowflake than more efficient queries.


SQL is popular because it can be run on a map/reduce backend. So once you have written your code it can run on any number of machines.


a) SQL is not that popular on map/reduce backends. Most people are doing it in code.

b) Only basic SQL works on any database and even then there are major differences in how they treat things like nulls, type coercion etc.


BigQuery? Athena/Redshift?


I usually only do one join at a time. But I separate them with CTEs ("WITH"). I can agree that many joins at once can make you grow grey hair.


That will be interesting soon when 80% of your code will be ai-generated black boxes…


Also good to note that unlike many other forms of AI, the SQL query optimizer AI is not a black box. It will tell you exactly how it thinks your data can be retrieved in the fastest way if you ask it to explain its plan for your query.


Ian Malcom gif




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: