It's really nice to see more content for MySQL/MariaDB still being made, since it feels like they are still good solutions for a whole variety of situations! While I might look in the direction of PostgreSQL for more advanced in-database processing, for most CRUD apps the aforementioned two are going to be more than enough!
> While I might look in the direction of PostgreSQL for more advanced in-database processing, for most CRUD apps the aforementioned two are going to be more than enough
I see this sentiment often here on HN. Something long the lines of "MySQL is enough for small apps but you want PotgreSQL for serious work."
When in practice I find the opposite to be true. PotgreSQL is hard to scale and hard to upgrade when compared to MySQL. I mean, just take a look at the caliber of companies that leverage MySQL at scale using Vitess to orchestrate it (spoiler, it powers Youtube, GitHub, Slack, Shopify and more):
This kind of argument that XYZ company uses ABC software so it's also good for my project is not correct. Your requirements and their requirements are different. Also for many of the above companies, when they were started MySQL used to be the popular solution, so they started with it and stayed with it. The way Facebook or Uber uses MySQL is not how you will use it.
Both PostgreSQL and MySQL are powerful databases and if you have right people with proper knowledge highly scalable systems can be developed with both.
The reason PostgreSQL is recommended in last 8-10 years is because just before that time, though MySQL was very popular it had some issues which were solved by PostgreSQL. So when web developers encountered those problems and they saw that PostgreSQL didn't have those issues, they started recommending PostgreSQL.
In my personal case, I was responsible for managing a Wordpress site with a few million visitors every day. (This is before AWS RDS, we had to set replication manually in those days) We had set up replication with MySQL 5.6. At that time MySQL replication had a few issues, and it used to break every few days.
At that time PostgreSQL replication which I was using in other projects was rock solid. So I started recommending PostgreSQL over MySQL. For others they have similar stories but for different issues they faced in MySQL.
Over the years of MySQL has improved and so has PostgreSQL.
> The way Facebook or Uber uses MySQL is not how you will use it.
With large companies like Meta/Facebook, there is no singular "way" that the company uses a particular database. Larger companies typically have self-service generic managed database infrastructure, similar to RDS but internal. The workloads tend to be quite varied.
> (This is before AWS RDS, we had to set replication manually in those days) We had set up replication with
> MySQL 5.6. At that time MySQL replication had a few issues, and it used to break every few days.
Your chronology isn't right: AWS RDS was released in Oct 2009, and gained multi-AZ replication in May 2010. At this time, Postgres didn't even have built-in replication support at all yet; it first gained built-in streaming replication support in Postgres 9.0, released in Sept 2010.
Meanwhile MySQL 5.6 was released (GA) in Feb 2013, several years after RDS already existed.
In any case, if your replication was breaking every few days in MySQL 5.6, that was something specific to your environment / configuration / workload. What you're describing is definitely far from common. If a replication stream breaking this often was the typical experience with MySQL 5.6, at Facebook's scale we would have had a replication breakage every few seconds, and that definitively was not the case.
That's especially true with out-of-the-box software like WordPress. I can't imagine Automattic experienced frequent replication breakages with a normal WP workload, as this would have been hugely operationally problematic for their hosted wordpress.com product. Perhaps you had a misbehaving plugin performing non-deterministic DML or something like that?
That all said -- Postgres is an amazing database, and there are many good reasons to choose it; but as with all technical choices, there's a set of trade-offs to consider. For example, originally Postgres only supported physical replication, not logical replication, and this made upgrading to a new major version quite painful as compared to MySQL.
I'm sure that others can comment on that, but in my experience PL/pgSQL is the killer feature that's hard to beat in PostgreSQL, for those cases where you want to store some amount of logic in the database itself (MySQL stored procedures feel a bit more limited). That said, it's not even the only procedural language that is available: https://www.postgresql.org/docs/current/xplang.html
In addition, working with JSON in PostgreSQL can be pretty nice for niche use cases, as is using PostGIS for geospatial data, in addition to some of the REST (e.g. PostgREST) or GraphQL (e.g. PostGraphile) projects, if you want to interact with the database as something that exposes web endpoints to let you retrieve and manipulate data directly, as opposed to just SQL communication with some back end.
That's not to say that MySQL or MariaDB don't have their own great offerings, but it's clear that PostgreSQL has gotten a lot of love in regards to people developing various integrations and extensions. That said, usually not needing the equivalent of PgBouncer out of the box is nice and personally MySQL Workbench feels better than pgAdmin due to the advanced ER functionality (forwards/backwards engineering and schema synchronization, so that you can create versioned DB migrations more easily if you write them in plain SQL).
> where you want to store some amount of logic in the database itself
If you’ve ever worked on a decently sized project, you’ll quickly realize this is an anti-pattern that you should avoid at all cost. Imagine having multiple teams updating that logic without any version control or visibility in what’s stored in pg.
> If you’ve ever worked on a decently sized project, you’ll quickly realize this is an anti-pattern that you should avoid at all cost.
Not sure about this.
On one hand, packages of reusable logic in the DB can be useful - like processing some data when you're selecting it, or doing common validations before inserting data, or even when trying to do some batch processing or reporting. On the other hand, I've worked on a large enterprise project where almost everything was done in the DB and Java was more or less used as a templating technology and to serve REST endpoints. Even with version controlled migrations, it was an absolute mess to work with, to debug and extend, even though the performance was great.
I've also talked with some people who still believe that the majority of logic should indeed be implemented as close to the source of the data as possible, as well as some other folks who don't feel using anything but their ORM of choice and prefer to abstract SQL away somewhat on the opposite end of the spectrum. Either approach can lead to issues, personally I'm somewhere in the middle - use ORMs if you please, map against views in the DB for when you want to select data in a non-trivial manner, consider some functions, or even stored procedures for batch processing, but don't get too trigger happy about it.
If you need lots of in-database processing for whatever reason, might as well use something that has a good procedural language, like PostgreSQL.
The reality seems to be that about half of respondents don't version their scripts, half don't debug stored procedures and the majority doesn't have tests in or against their database. It's not that you can't do these things, it's just that people choose not to. I'd expect a locally launched DB instance with all of the migrations versioned and automated, as well as data import/seeding to be the norm.
> Even with version controlled migrations, it was an absolute mess to work with, to debug and extend, even though the performance was great.
This is exactly what I mean. Sure, anything is technically possible, I’m not saying that you can’t version your stored procedures (even though even that has almost never been the norm on any team I’ve worked on). But is it the ideal setup for your team/project? Far from it.
> Most of our data (users, photo metadata, tags, etc) lives in PostgreSQL; we’ve previously written about how we shard across our different Postgres instances
Another relatively big company that uses PostgreSQL is Gitlab.
And as far as I know Netflix was a big Cassandra user then migrated to CockroachDB. I tried to search for "Netflix Postgresql" and found this comment from 2016 stating that they chose MySQL over PostgreSQL: https://news.ycombinator.com/item?id=11950811
If by "advanced in-database processing" you mean stuff like PostGIS, then yes PostgreSQL is the right approach. But for most people building web-apps to scale, or established businesses who want to pick a solution and stick to it for basically ever, MySQL is much easier to scale. There's a ton of companies running MySQL at scale. I never heard of the same for Postgres, and for a long while I had an intellectual preference for Postgres as being the more "pure" implementation. Experience in the field showed me that MySQL, despite its flaws, is the name of the game.
Further "evidence" by referring to other experienced folks who worked on scaling SQL databases, and MySQL is what's used and what folks have experience with:
MySQL also has had major bugs in recent GA releases like crashing your server when renaming a table.
PostgreSQL scales differently since it doesn't have redo-log based MVCC or other things as well. It does value correctness and has (mostly) better defaults. It has also had its own embarrassing bugs, though IME few put data integrity or availability at risk.
I wonder where that "MySQL is easier to scale" trope comes from. Have you ever tried to set up and operate a MySQL/MariaDB cluster? It's a horrible and brittle shitshow. Even if you get it running, there's no guarantee things don't just turn south at any given moment.