I love that I'm at a company that I haven't had to use SQL. I don't know if I could even find a relational database here.
But in a classic meme sense: "It isn't you I hate SQL, it's what I have to think about when I use you." I don't want to spend my time thinking about database indexes, query plans, or other complicated bits. I just want to get my data out.
Sometimes I struggle, and I'd give anything for a plain inner join, but on the whole I like this tradeoff.
I still teach SQL, via SQLite, as a first language to students, even though in practice I do 90% of my data analysis through the command line or Pandas. But SQL is very direct and clear as a language, and the overhead of SQLite is...well, light. Making an index is just one more line of code that for many datasets, you run once and never touch again.
What do you find simpler in terms of data querying and management?
I love sql myself. I'm not sure why I love it so much and other people hate it. Sometimes I think that coming up with reasons is just after the fact justification (do I really like it because it has a solid mathematical foundation in the relational calculus, or does that just sound like a good reason to say I like it)?
Pandas is an excellent contribution to data analysis, but I gotta tell you, I was absolutely delighted to learn about pandasql and realize I could write selects, group by, order by, and so forth, converting data frame to data frame through sql.
What can I say? I truly encourage people to give SQL a real chance. I'm happy to use ORMs that save me some typing on simple queries, and certainly I use a combination of pandas and sql, there are some things much better accomplished through a programming language. But I'm willing to drop to SQL pretty quickly if I sense that the programming language or library is really just forcing me to re-learn a new implementation of sql.
I agree, I was always so so about SQL until I had to use a lot of it. The syntax isn't amazing, but the relational model works really well once you can think in sets. The declarative nature of it means that if I push stuff from the app layer to the database layer its usually less buggy and has better performance.
I used to dislike it myself, and in the context of building within a framework like Rails, I'll defer to an ORM. But my appreciation for SQL deepened as I became a more experienced programmer and began valuing explicitness in code. The reason why I do most data work in Python is because most of my work is exploratory, and I need a language that is well-suited for gluing together services and non-analysis-specific functions. But when it comes to doing work in which time and consistency and portability is Key, SQL is my choice for structured data.
That's interesting. I didn't learn SQL until taking a course on it in grad school. I had some programming background, mainly from math classes and a bit of CS.
the concept of a query really seemed odd, it took some mental adjusting. My only programming background was largely in writing procedural code. My brain really wanted to think in terms of methods, functions, inputs, and return values.
It's been so long (18 years since that course), that it is difficult to remember the mindset I was in. It all seems pretty natural now. But you are absolutely right, you do need to start thinking in terms of sets and operations on sets, not in terms of methods with inputs, procedures, and return values
I was in a similar boat. I studied computer engineering in school but didn't take any database course. My first encounter with SQL was years later as an education reporter. Someone in the newsroom threw out a "Access 97 for Dummies" and I picked it up and realized it was just what I needed to do the kind of analysis I wanted on California schools and test scores (CA.gov is quite good about posting raw data [0])
I eventually moved from Access to MySQL/SQLite, but if it weren't for Access's interactive query designer, where the tables are presented as lists of columns and you draw/drag lines to declare a JOIN between columns [1], I honestly don't think I would have ever grokked SQL.
I actually think that without SQL, I would have never understood the power and purpose of a JOIN, even though that concept of comparing lists is the foundation of most investigative research. So I justify forcing SQL on to journalism students not merely because it's a useful skill, but because it presents the best vocabulary and grammar for describing the idea of joins.
Also, SQL's general lack of data-munging functions or conveniences really drives home that data is just characters and numbers, and doesn't magically have meaning or come into existence. That is an extremely difficult concept to convey with Excel -- e.g. how it automagically converts `9-11` into `11-Sep` and confers a datetime context onto what was just plaintext.
Couldn't agree more - I often see this pattern in programmers - they try to think in cursors instead of sets and end up really fighting the instrument - not using it as intended.
I kind of inferred this, rather than outright stating it, but I like SQL. The concepts are elegant, the language is a little kludgy but not obscene. I can never really remember the semantics of left, right, outer and inner joins, but I usually don't want missing data so inner all the way.
It's not SQL I dislike, it's that it means I'm using a database, and I've lost more of my life than I care to admit to in planning, designing, and maintaining databases so that they perform under production pressures.
The functionality of SQL is great but I really wish it would like more look like other programming languages. I wonder how SQL would look like if someone would design it now.
It also should integrate easier with other languages. Using SQL from C# or C++ is a real pain with tons of casts and string manipulation. Better debugging would also be nice.
A SQL like language in JavaScript is Mongo shell, which is much more complex that SQL. Every time I use it I wish they adapted SQL to Mongo instead of writing their own thing. I usually end up writing complex queries in Ruby with Mongoid. They're more compact and easier to understand.
About ActiveRecord/AREL vs SQL, they're more or less the same. Sometimes a query is easier to understand in Ruby, sometimes in SQL. Complex queries are much easier to read in SQL or outright impossible to code in Ruby.
If I could change something about SQL I'd change all the arbitrary weirdnesses of the data definition language and the non orthogonal stuff, like HAVING which is a WHERE in a different context (aggregations vs rows).
Wow. I'm surprised at the negative thoughts about SQL. I didn't realize I was an outlier.
I'm grateful that it doesn't look like other languages. The format it uses is really easy to generate using other languages, which makes it really useful in a lot of what I do.
I found that SQL is actually an excellent language for people with no coding background. Especially in the BI field it's easy to teach people SQL so that they can perform their own (often quite powerful) queries. It's much easier than learning a proper programming language.
A re-design might be nice for programmers, but would likely lock out most people with no programming backgrounds who work on databases.
Maybe simple queries like "Select where" are intuitive for non-programmers but once you start joining tables things get pretty ugly. Also, the syntax for UPDATE is totally different from INSERT although they do similar things. Why does UPDATE use key value pairs but INSERT uses list of column names followed by list of values?
Since Javascript is the best programming language ever (so I hear :-) ) maybe using Javascript syntax and objects would be nice.
Like the FireFox SQLite Manager plugin, it's cross-platform and free and fairly easy for anyone to install. DB Browser's interface is substantially cleaner than the Firefox plugin, though, and the devs are responsive on Github.
DB Browser is great, all things considered, but I've never found a GUI that didn't overcomplicate the kinds of simple interactions I want when teaching databases (i.e. 99% of what I teach are SELECT queries). It also makes a few concessions in order to appeal to the Excel crowd. For example, not committing writes to the database until you do a Cmd-S...which contradicts what I tell my students about how SQL gives you more responsibility through explicit and consistent behavior (e.g. when you delete a row, it's gone forever...so don't ever delete things unless you really need to). On the other hand, some of the concessions are nice, such as the ability to rename columns via the GUI, which is not part of the SQLite spec.
So I have contemplated teaching SQL from the command line. But I've had a hard time imagining how the steep learning curve could justify the potential advantages for the average novice.
(that page also includes how to work with Sequel Pro and MySQL; I soon realized that giving students two choices for learning SQL was a very bad idea and have since stuck to SQLite)
I've found it better to provide pre-packaged databases for students and basically ignore the administrative part of SQL and databases -- e.g. importing of data, defining a schema, creating indexes, etc. -- until we've learned all we can about working with SELECT and JOIN: http://2016.padjo.org/tutorials/sqlite-data-starterpacks/
Hard to imagine how it's to live in a company that has no database. But I do take issue with that:
> It isn't you I hate SQL, it's what I have to think about when I use you.
Do you live at the fringe of performance like that? Nobody spends much time thinking about those things, and the few times that require thinking about them would require thinking about much more complicated things if you weren't using a database.
It's not so much existing at the fringe of performance, but there have been weeks of my life lost to needing to tune a query or an index. Sure, some of that time is actually building the index I just added. But I don't want to have to figure out a query plan.
If I'm building a simple CRUD app, then I don't need the complexity that SQL provides. If I'm building something bigger, then I probably need to think about the query plans.
I would have loved a tool that I could hand a proposed schema, estimates of the data, and some SQL I want to run that would come back and tell me "That's stupid, because you're doing a join on computed data; add this index, denormalize this field, and remove this clause from your query." but I'm also realistic enough to know that such a tool isn't plausible.
Postgres has explain and some third parties make query tuners.
Oracle has a host of third party tunes, Solar Winds being the first that comes to mind.
I don't know if any of these can do it off a proposed schema but you could create a dummy database and your insert/read routines run a script against it to simulate transactions and run the capture on these tools and they will explain the recommended adjustments. Some of them like MMSQL query tuner will export a script that can make the adjustments.
The biggest thing for DB performance for most CRUD applications is to ensure you eliminate table spools on joins. If you find all of those you are ahead of the game in most instances.
The worst of my time was using Oracle, around 2008. Back then SolarWinds was just a consulting company. And I don't really do much relational work anymore. A lot of my stuff ends up being documents with simple keys organized in hierarchical ways, and the few relational things I do are easily handled with that model, and wouldn't justify an actual database.
Why should I have to know what a table spool is and why they're bad?
Surely there's a relational DB somewhere...perhaps indirectly via Saas? Maybe payroll, company books, timesheets, or some other software common to most businesses.
But in a classic meme sense: "It isn't you I hate SQL, it's what I have to think about when I use you." I don't want to spend my time thinking about database indexes, query plans, or other complicated bits. I just want to get my data out.
Sometimes I struggle, and I'd give anything for a plain inner join, but on the whole I like this tradeoff.