That *is* a hot take... ;) But joins should never impact performance in a large ...

jiggawatts · on Aug 12, 2023

Storage might be cheap, but memory and bandwidth aren’t.

Memory is the new disk, and disk is the new tape.

You want everything to remain resident in memory, and spool backups and transaction logs to disk.

If you’re joining from disk, you’ve probably done something wrong.

E.g.: compression is often a net win because while it uses more CPU, it allows more data to fit into memory. And if it doesn’t fit, it reduces the disk I/O required.

This is why I look upon JSON-based document databases in horror. They’re bloating the data out many times over by denormalizing and then expand that into a verbose and repetitive text format.

This is why we have insanity like placeholders for text on web apps now — they’re struggling to retrieve a mere kilobyte of data!

yashap · on Aug 12, 2023

Joins are not inherently expensive, but they can lead to expensive queries. For example, say I want to find the 10 most recent users with a phone number as their primary contact method:

SELECT …

FROM User

JOIN ContactMethod on ContactMethod.userId = User.id

WHERE ContactMethod.priority = ‘primary’ AND ContactMethod.type = ‘phoneNumber’

ORDER BY User.createdAt DESC

LIMIT 10

If there are a very large number of users, and a very large number of phone number primary contacts, you cannot make this query fast/efficient (on most RDBMSes). You CAN make this query fast/efficient by denormalizing, ensuring the user creation date and primary contact method are on the same table, and then creating a compound index. But if they’re in separate tables, and you have to join, you can’t make it efficient, because you can’t create cross-table compound indeces.

This pattern of join, filter by something in table A, sort by something in table B, and query out one page of data, is something that comes up a lot. It’s why ppl thing joins are generally expensive, but it’s more like they’re expensive in specific cases.

zepolen · on Aug 12, 2023

For 10 million users + telephones, this takes 1ms.

    create table users (
        id serial primary key not null,
        created_at timestamp not null default now()
    );

    create table users_telephones (
        user_id int references users(id) not null,
        is_primary boolean not null default true,
        telephone varchar not null
    );

    insert into users
    select i, NOW() + (random() * (interval '90 days')) + '30 days' from generate_series(1, 10000000) i;
    insert into users_telephones select id, true, random() :: text from users limit 10000000; -- all users have a primary telephone
    insert into users_telephones select id, false, random() :: text from users limit 200000; -- some users have a non primary telephone
    create index on users(created_at);
    create index on users_telephones(user_id);
    create index on users_telephones(user_id, is_primary) where is_primary;

    select count(*) from users;
    count   
    ----------
    10000000
    (1 row)

    Time: 160.911 ms


    select count(*) from users_telephones;
    count   
    ----------
    10200000
    (1 row)

    Time: 176.361 ms


    select
        *
    from
        users u
        join users_telephones ut on u.id = ut.user_id
    where
        ut.is_primary
    order by
        created_at
    limit
        10;

    id    |         created_at         | user_id | is_primary |     telephone      
    ---------+----------------------------+---------+------------+--------------------
    9017755 | 2023-09-11 11:45:37.65744  | 9017755 | t          | 0.7182410419408853
    6061687 | 2023-09-11 11:45:39.271054 | 6061687 | t          | 0.3608686654204689
    9823470 | 2023-09-11 11:45:39.284201 | 9823470 | t          | 0.3026398665522869
    2622527 | 2023-09-11 11:45:39.919549 | 2622527 | t          | 0.1929579716250771
    7585920 | 2023-09-11 11:45:40.256742 | 7585920 | t          | 0.3830236472843005
    5077138 | 2023-09-11 11:45:41.076164 | 5077138 | t          | 0.9058939392225689
    1496883 | 2023-09-11 11:45:42.459194 | 1496883 | t          | 0.1519510558344308
    9234364 | 2023-09-11 11:45:42.965896 | 9234364 | t          | 0.8254433522266105
    6988331 | 2023-09-11 11:45:43.130548 | 6988331 | t          | 0.9577098184736457
    7916398 | 2023-09-11 11:45:43.559425 | 7916398 | t          | 0.9681218675498862
    (10 rows)

    Time: 0.973 ms

yashap · on Aug 12, 2023

Ty for benchmarking, but this isn’t a good benchmark for the issue I’m talking about.

This is only fast because 100% of users have a phone number as a primary contact, so the join filter is essentially meaningless. If in the contact table, the filtered number is a small percentage of the total (e.g. most users have an email as their primary contact, not a phone number), but still a good size (e.g. there’s still hundreds of thousands to millions of phone primary contacts), it’s a much harder query.

It’s probably also fast because you have a warm cache - e.g. there’s enough memory for the DB to have the indexes 100% in memory, which is just not feasible with large DBs in the real world, where you can easily have >100GB of indexes + hot data, and the DB can’t keep it all in memory. In most real world scenarios, having to somewhat frequently read pages of indexes off disk, into memory, to satisfy queries, is common.

Try it again, with the exact same data, but:

- Search for users with a non-primary phone contact (you have 200,000 of these, and 10,000,000 users)

- Give the DB say 1/3 the memory of your total index size, so the complete indexes can’t be in memory

- Run the query right after starting PG up, to ensure the cache is cold (with a hot cache, almost everything is fast, but in real world situations with lots of users the cache isn’t consistently hot)

sgarland · on Aug 12, 2023

> It’s probably also fast because you have a warm cache - e.g. there’s enough memory for the DB to have the indexes 100% in memory, which is just not feasible with large DBs in the real world, where you can easily have >100GB of indexes + hot data, and the DB can’t keep it all in memory.

That's the point? Sure there is a scale where it's infeasible, but you can quite easily (albeit it's pricey) get DB instances with hundreds or thousands of GiB of RAM. Even if you can't get everything into it, your working set is often not the size of the entire data set. My company's main MySQL primary has around 800 GiB of storage, and only about 1/3 of that in RAM. Disk read IOPS are usually 0.

Nevertheless, I recreated this in Postgres 15.

The total index size of the two tables is 790 MB according to `\di+*`, so I'll set `shared_buffers` to 263 MB, then restart and re-run.

For reference, time with current settings is about 6.8 msec for `is_primary`, and 1395 msec for `NOT is_primary`.

After a restart, I ran the query for `NOT is_primary` first, which took 1740 msec. The first run of the query with `is_primary` took 21 msec.

My DB is hosted on older hardware, and the files themselves live on NVMe drives presented via Ceph over 1GBe.

EDIT: I forgot that Postgres uses OS cache for a lot of its memory, not just its own. Re-did this, running `sync; sync; sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'` in between shutting down/starting up Postgres. 16726 msec and 79 msec, respectively. So yes, a lot slower, but a. I don't think this is realistic for a production database b. I'm still not clear about how you think JOINs enter into this. The slowdown comes entirely from having to run a table scan.

yashap · on Aug 13, 2023

First off, ty for running all these benchmarks, above and beyond!

FWIW, I don’t think joins are bad, I’m 100% for normalized DB schemas with joins. But I’ve done tonnes of performance work over the past ~10 years, and run into a bunch of real world cases where, when caches are cold (which does happen frequently with large datasets and limited budgets), queries similar to the above (join two tables, read a page of data, sorting on one table and filtering on the other) take 10s of seconds, sometimes even minutes with very large datasets. In those cases, the only solution has been to denormalize so we can create a compound index, with filter key(s) as the prefix, sort key as the suffix, which makes the query consistently fast. I am not at all suggesting this as the default, better to default to normalized with joins, and only do this for these specific cases. Generally a company just has one or a few cases where this is necessary, can just do the ugly denormalization + compound indexes for these few hot spots. But when ppl say “joins are slow”, these cases are examples of where it’s true.

Re: your above 17 second query, if you do something like adding fields to the user table for primary and secondary contact method (denormalizing), and then create a compound index with the necessary filter keys first, and the sort key second, I think you’ll find the query (which no longer needs a join) is quite fast even if caches are ice cold.

sgarland · on Aug 13, 2023

Created a new table that contains `user_id, created_at, phone_primary, phone_secondary`. Inserted all 10,200,000 rows. Notably (I'll come back to this) due to the generation of the rows, the primary key (`user_id`) is an unsorted integer - this was _not_ done with a serial or identity.

  postgres=# CREATE INDEX sec_phone_created_at ON hn_phone_new (phone_secondary, created_at) WHERE phone_secondary IS NOT NULL;

I reset `shared_buffers` down to the same as before - 263 MB - although the size of this index is tiny, < 10 MB, so realistically I can't shrink buffers down that far anyway. I then did the same `sync/drop cache` as before.

  postgres=# SELECT * FROM hn_phone_new WHERE phone_secondary IS NOT NULL ORDER BY created_at LIMIT 10;
     id   |     created_at      |   phone_primary    |  phone_secondary   
  --------+---------------------+--------------------+--------------------
    58816 | 1995-05-23 03:22:02 | +49 030 522866-87  | +1 159-445-4810
    49964 | 1995-05-23 03:23:00 | +61 02 7440 8606   | +254 20 925 892
   171828 | 1995-05-23 05:06:47 | +380 32 393-35-89  | +49 030 429376-29
    78333 | 1995-05-23 05:31:22 | +380 32 147-11-20  | +52 55 6409 5253
    24264 | 1995-05-23 06:47:21 | +44 0131 6506 1823 | +49 030 610965-83
    96662 | 1995-05-23 06:57:03 | +52 55 1473 0538   | +61 02 5414 8204
    15023 | 1995-05-23 07:55:37 | +44 0131 7959 1581 | +44 0131 8491 6194
    52029 | 1995-05-23 08:59:19 | +380 32 430-77-54  | +254 20 374 856
    20518 | 1995-05-23 09:51:14 | +380 32 264-21-79  | +52 55 7787 0236
    80273 | 1995-05-23 14:59:26 | +61 02 8863 4466   | +33 01 16 10 78 56
  (10 rows)

  Time: 2258.807 ms (00:02.259)

So yes, significant improvement as you'd expect. I then dropped the index and swapped the order:

  postgres=# DROP INDEX sec_phone_created_at;
  postgres=# CREATE INDEX created_at_sec_phone ON hn_phone_new (created_at, phone_secondary) WHERE phone_secondary IS NOT NULL;

Reset everything as before, and re-ran the same query:

  Time: 221.392 ms

Thinking that like MySQL, a portion of the `shared_buffers` had been saved to disk and put back in upon restart (honestly I don't know if Postgres does this), I attempted to flush it by running a few `SELECT COUNT(*)` on other, larger tables, then re-running the query.

  Time: 365.961 ms

This is what `EXPLAIN VERBOSE` looks like for the original index:

   Limit  (cost=8.44..8.45 rows=1 width=61)
     Output: id, created_at, phone_primary, phone_secondary
     ->  Sort  (cost=8.44..8.45 rows=1 width=61)
           Output: id, created_at, phone_primary, phone_secondary
           Sort Key: hn_phone_new.created_at
           ->  Index Scan using sec_phone_created_at on public.hn_phone_new  (cost=0.42..8.43 rows=1 width=61)
                 Output: id, created_at, phone_primary, phone_secondary
  (7 rows)

And this is what it looks like for the second, with the columns swapped:

  Limit  (cost=0.42..8.43 rows=1 width=61)
     Output: id, created_at, phone_primary, phone_secondary
     ->  Index Scan using created_at_sec_phone on public.hn_phone_new  (cost=0.42..8.43 rows=1 width=61)
           Output: id, created_at, phone_primary, phone_secondary
  (4 rows)

So it actually needs to be reversed, so that the query planner doesn't have to add a sort step for the ORDER BY.

yashap · on Aug 14, 2023

Yeah, I believe which order is best (sortKey/filterKey or filterKey/sortKey) really depends on the specific data/queries, best to try both and pick the best one - looks like sortKey/filterKey in this case :)

But I think this does show how, for specific data/queries, sometimes you do have to denormalize, so that you can create the ideal compound index, for specific problem queries. Should still go with normalized schemas and joins as a default, but if problem queries pop up like this that are taking 10, 20, 30 seconds sometimes (when caches are cold), compromising a bit on clean schemas/code for performance makes sense.

I also created a benchmark here, for Postgres: https://gist.github.com/yashap/6d7a34ef37c6b7d3e4fc11b0bece7...

sgarland · on Aug 14, 2023

If you’re limited in RAM and can’t upsize, then yes, this does appear to be a good trade off. You can always refactor later and normalize if necessary.

BTW, although it wouldn’t have helped for your specific benchmark schema creation of TYPES, I’ll plug my genSQL tool [0] for generating random data. It’s primarily designed around MySQL, but it can produce CSVs easily, which every DB can load.

Turns out a lot of random() calls in most languages is slow af, so mine avoids that by (mostly) batching them in a C library. Should be able to create a million somethings in under 10 seconds on modern hardware in Python 3.11.

[0]: https://github.com/stephanGarland/genSQL

yashap · on Aug 19, 2023

Looks useful, ty!

aidos · on Aug 12, 2023

Thanks for bothering to work it through, I was too lazy.

But, yeah, exactly. Everyone thinks they need to optimise the life out of this stuff at the beginning but the db can do a lot with normalised data and the appropriate indexes.

Side note - is_primary isn’t required in the partial index itself since they’ll all be “true” due to the where clause.

catalypso · on Aug 12, 2023

Thanks for the effort.

Probably nitpicking but these types of measures are usually tricky to interpret because there is a high chance your indexes (maybe even rows) are still on PostgreSQL shared buffers and OS cache and might not reflect real usage performance.

To get a more "worst-case" measure, after your inserts and indexes creation, you can restart your database server + flush OS pages cache (e.g. drop_caches for Linux), then do the measure.

Sometimes the difference is huge, although I don't suspect it will be in this case.

ElatedOwl · on Aug 12, 2023

imo a properly configured Postgres server should have enough RAM to keep any hot data in cache. The cached path is the accurate measurement.

ghusbands · on Aug 12, 2023

What proportion of users had a primary telephone contact? I think you'd need to be skipping over a lot of users (those without a primary telephone contact) to hit the pathological case that's implied.

sgarland · on Aug 12, 2023

Decided to re-create this in MySQL on fairly old hardware, and with actual phone numbers - the latter shouldn't make a difference since they're still VARCHAR, but I already have a program [0] to generate schema with them, so why not?

I did have to do a few manual updates after data load because the aforementioned program can't make foreign keys yet, and also for bools (which MySQL stores as tinyint(1)) I'm randomly generating them via `id & 1`, which isn't what you had.

Also, I gave `hn_phone` its own auto-increment int as a PK, so I could have a non-unique index on `user_id`. In MySQL, if you create a table without a PK, you get one of these, in descending order of precedence:

* The first indexed UNIQUE NOT NULL column promoted to PK

* An invisible, auto-generated, auto-incrementing integer column called `my_row_id` as PK (MySQL >= 8.0.30, if sql_generate_invisible_primary_key=1)

* A hidden index called `GEN_CLUST_INDEX` created on a super-invisible (i.e. doesn't show up in table definition) column called `ROW_ID`, but that column is shared across the entire DB so please don't do this

It's worth noting that since the first 10,000,000 rows all have `is_primary` set, this can finish extremely quickly. If you invert that match with these tables, you have to do a table scan on `hn_phone`, and the time jumps up to about 5650 msec. If you change the `hn_phone` index to be a composite on (`user_id`, `is_primary`) and then rewrite the query to use a subquery instead of a join, the time drops to around 7 msec. You might see a slight speed-up if you index `created_at` in descending order if that was the normal access pattern.

Anyway:

  OS: Debian Bullseye 5.10.0-23-amd64
  Virtualized: Yes (Proxmox)
  CPU: E5-2650 v2 @ 2.60GHz
  Allocated Core Count: 16
  Allocated RAM: 64 GiB PC3-12800R
  Disk: Samsung PM983 1.92 TiB via Ceph
  Filesystem: XFS
  Mount Options: defaults,noatime
  MySQL Version: 8.0.34
  MySQL Options (non-default):
    innodb_buffer_pool_instances = 16
    innodb_buffer_pool_chunk_size = 134217728
    innodb_buffer_pool_size = 17179869184
    innodb_numa_interleave = 1
    innodb_sync_array_size = 16 # this shouldn't apply here, but listing anyway
    innodb_flush_method = O_DIRECT
    innodb_read_io_threads = 16
    innodb_write_io_threads = 16 # this shouldn't apply here, but listing anyway

  CREATE TABLE `hn_user` (
    `id` int unsigned NOT NULL AUTO_INCREMENT, 
    `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, 
    PRIMARY KEY (`id`), 
    KEY `user_created_at` (`created_at`)
  );

  CREATE TABLE `hn_phone` (
    `id` int unsigned NOT NULL AUTO_INCREMENT, 
    `user_id` int unsigned NOT NULL, 
    `is_primary` tinyint(1) NOT NULL DEFAULT '1', 
    `phone` varchar(255) NOT NULL, 
    PRIMARY KEY (`id`), 
    KEY `user_id` (`user_id`), 
    CONSTRAINT `hn_phone_ibfk_1` FOREIGN KEY (`user_id`) REFERENCES `hn_user` (`id`)
  );

  mysql> SELECT COUNT(*) FROM hn_user UNION SELECT COUNT(*) FROM hn_phone;
  +----------+
  | COUNT(*) |
  +----------+
  | 10000000 |
  | 10200000 |
  +----------+
  2 rows in set (1.20 sec)

  mysql> SELECT 
    u.id, 
    u.created_at, 
    ut.is_primary, 
    ut.phone 
  FROM 
    hn_user u 
    JOIN hn_phone ut ON u.id = ut.user_id 
  WHERE 
    ut.is_primary 
  ORDER BY 
    u.created_at DESC 
  LIMIT 10;

  +---------+---------------------+------------+--------------------+
  | id      | created_at          | is_primary | phone              |
  +---------+---------------------+------------+--------------------+
  | 6906106 | 2023-08-12 06:08:25 |          1 | +61 02 5317 2261   |
  | 6906106 | 2023-08-12 06:08:25 |          1 | +254 20 294 205    |
  | 6738922 | 2023-08-12 06:07:12 |          1 | +61 02 1247 3361   |
  | 6738922 | 2023-08-12 06:07:12 |          1 | +44 0131 8386 4494 |
  | 7449553 | 2023-08-12 06:03:55 |          1 | +61 02 7649 6731   |
  | 7449553 | 2023-08-12 06:03:55 |          1 | +61 02 7893 9835   |
  | 6908862 | 2023-08-12 05:51:52 |          1 | +81 03 6743-6893   |
  | 6908862 | 2023-08-12 05:51:52 |          1 | +44 0131 8414 7888 |
  | 4134961 | 2023-08-12 05:51:42 |          1 | +1 614-908-1719    |
  | 4134961 | 2023-08-12 05:51:42 |          1 | +44 0131 9898 8958 |
  +---------+---------------------+------------+--------------------+
  10 rows in set (0.00 sec)

  mysql> WITH latest_event AS (
    SELECT 
      event_id 
    FROM 
      performance_schema.events_statements_history_long 
    WHERE 
      sql_text LIKE 'SELECT u.id%' 
    ORDER BY 
      event_id DESC 
    LIMIT 1
  ) 
  SELECT 
    event_name, 
    TRUNCATE(
      TIMER_WAIT / POW(10, 9), 
      3
    ) AS 'duration (msec)' 
  FROM 
    performance_schema.events_stages_history_long stg 
    JOIN latest_event ON stg.nesting_event_id = latest_event.event_id 
  UNION 
  SELECT 
    "total", 
    TRUNCATE(
      TIMER_WAIT / POW(10, 9), 
      3
    ) 
  FROM 
    performance_schema.events_statements_history_long stmt 
    JOIN latest_event ON stmt.event_id = latest_event.event_id;

  +------------------------------------------------+-----------------+
  | event_name                                     | duration (msec) |
  +------------------------------------------------+-----------------+
  | stage/sql/starting                             |           0.261 |
  | stage/sql/Executing hook on transaction begin. |           0.003 |
  | stage/sql/starting                             |           0.016 |
  | stage/sql/checking permissions                 |           0.006 |
  | stage/sql/checking permissions                 |           0.005 |
  | stage/sql/Opening tables                       |           0.134 |
  | stage/sql/init                                 |           0.008 |
  | stage/sql/System lock                          |           0.023 |
  | stage/sql/optimizing                           |           0.034 |
  | stage/sql/statistics                           |           0.087 |
  | stage/sql/preparing                            |           0.074 |
  | stage/sql/executing                            |            0.74 |
  | stage/sql/end                                  |           0.003 |
  | stage/sql/query end                            |           0.003 |
  | stage/sql/waiting for handler commit           |           0.025 |
  | stage/sql/closing tables                       |           0.019 |
  | stage/sql/freeing items                        |           0.176 |
  | stage/sql/cleaning up                          |           0.003 |
  | total                                          |           1.654 |
  +------------------------------------------------+-----------------+
  19 rows in set (0.00 sec)

[0]: https://github.com/stephanGarland/genSQL # shameless plug; it's super messy and probably unintuitive, but it's getting better/faster and has been a fun ride learning how fast you can make Python (and when to offload to C).

franckpachot · on Aug 12, 2023

With an index on User (createdAt, id) and one on ContactMethod ( primary,ContactMethod,userId), it should be fast (check that the the execution plan starts with User). Except if lot of recent users have no phones, but that will not be better in a single table (except if columnar storage)

aidos · on Aug 12, 2023

I imagine so too. You’ll be able to interate over the users in order and hit an instant index on contact method.

    select
      *
    from
      user
    where
      exists (
        select
          true
        from
          contact_method cm
        where
          cm.contact_id = contact.id
          and cm.method = 'phone'
          and cm.primary
      )
    order by
      created_at desc
    limit 10

    —- partial index is even faster
    create index primary_phone on contact_method (contact_id) where method = 'phone' and primary;

yashap · on Aug 14, 2023

See my reply here: https://news.ycombinator.com/item?id=37116015

Turskarama · on Aug 12, 2023

In my experience with SQL, a query like that should return in under a second even if you have 100k or more users.

There are some other tricks you can use if you're clever/lucky as well. If you're just using integer IDs (which is reasonable if your system isn't distributed) then you could order by userid on you ContactMethod table and still get the same speed as you would with no join.

yashap · on Aug 14, 2023

See my reply here: https://news.ycombinator.com/item?id=37116015

Zanfa · on Aug 12, 2023

> If there are a very large number of users, and a very large number of phone number primary contacts, you cannot make this query fast/efficient

I think specific numbers would help make this point better. With a few hundred thousand to low millions of users this should be plenty fast in Postgres for example. That’s magnitudes more than most startups ever reach anyway.

yashap · on Aug 14, 2023

See my reply here: https://news.ycombinator.com/item?id=37116015

yashap · on Aug 14, 2023

Lots of replies to this one! I created a little benchmark that you can easily run yourself, as long as you have Docker installed. It shows how, for cases like the one I described above, the only way to have consistently fast queries (i.e. even with a cold cache) is to denormalize, so you can create the ideal compound index. The normalize/join version takes 15x longer, which can be the difference between 1s and 15s queries, 2s and 30s, etc.

The benchmark: https://gist.github.com/yashap/6d7a34ef37c6b7d3e4fc11b0bece7...

Note: I think in almost all cases you should start with a denormalized schema and use joins. But when you hit cases like the above, it's fine to denormalize just for these specific cases - often you'll just have one or a few such cases in your entire app, where the combination of data size/shape/queries means you cannot have efficient queries without denormalizing. And when people say "joins are slow", it's often cases like this that they're talking about - it's not the join itself that's slow, but rather that cross-table compound indexes are impossible in most RDBMSes, and without that you just can't create good enough indexes for fast queries with lots of data and cold caches.

throwdbaaway · on Aug 14, 2023

Nice benchmark script. With EXPLAIN (ANALYZE, BUFFERS), I see that

* normalized/join version needs to read 5600 pages

* normalized/join version with an additional UNIQUE INDEX .. INCLUDE (type) needs to read 4500 pages

* denormalized version only needs to read 66 pages, almost 100x fewer

Related to this pagination use case, when using mysql, even the denormalized version may take minutes: https://dom.as/2015/07/30/on-order-by-optimization/

yashap · on Aug 14, 2023

Ooh ty, will give that article a read! And yeah, that's really the trick to queries that are consistently fast, even with cold caches - read few pages :)

tlarkworthy · on Aug 12, 2023

Yes this is the exact situation where sql falls short. You can't make cross-table indexes to serve OLAP-esq queries, and the most recent X is the common one for pagination in applications. I prefer to denormalize manually at write time in a transaction, rather than use triggers or materialized views.

ako · on Aug 12, 2023

Why do you prefer manually doing this rather than using materialized views? Materialized views seem easier to create and maintain?

tlarkworthy · on Aug 12, 2023

Because they are o(n) complexity to refresh so either you settle for eventual complexity or have expensive writes. By forwarding just the index data to the right table you maintain an consistent idiomatic index at 0(1) write cost

gen220 · on Aug 13, 2023

Have you considered implementing this with database triggers instead of in your application logic?

Requires a bit of brainpower to set up a system around it, but it makes your application logic dramatically simpler.

(You don't have to remember to update `foo.bar` every time you write a function that touches `moo.bar`, and if you run migrations in SQL, the updates will also cascade naturally).

It's really high up on my personal wish list for Postgres to support incrementally-materialized views. It doesn't seem like it would be impossible to implement (since, as I suggested, you can implement it on your own with triggers), but IDK, I assume there are higher-priority issues on their docket.

tlarkworthy · on Aug 14, 2023

considered and rejected because triggers complicate other database operations like restores. Triggers to me are like global operations, they really make a strong statement about storage which does not tend to be true over longer time horizons when you consider all the stackeholders of the data storage system. They make sense as an application feature, but they are implemented as globals, and this is where the problems begin.

aidos · on Aug 12, 2023

Agreed. Matarialized views will be limited in use until we get nice incremental updating versions.

unnouinceput · on Aug 12, 2023

If the tables involved in the join are of 100M+ records what I do when the joins use varchar columns to improve the performance is to use an additional integer column of the varchar one that is a CRC of it (or hash if you prefer that) and use the integer one instead in the join.

aidos · on Aug 12, 2023

That seems weirdly convoluted. So you store two columns to represent the foreign key?

unnouinceput · on Aug 12, 2023

If you use a varchar as FK then you're definitely doing something wrong from beginning. OP was talking about getting the phone number under certain conditions, and a phone number column is a varchar.

aidos · on Aug 12, 2023

I don’t think they ever did a lookup on phone number and even if you needed to, you would just index it and it’d be fast. You can even use things like tri-gram indexes to give you super fast partial phone number matching.

endisneigh · on Aug 12, 2023

I agree but I’m talking in the context where you can’t vertically scale anymore.

I also don’t think it’s worth the trouble “never using joins” for an existing project. Denormalize as necessary. But for a green one I honestly think since our access patterns can be understood as you continue you can completely get rid of joins.

Again, assuming your new project can’t fit on a single machine. If it can you’re best just following the “traditional” advice, or better yet keep everything in memory.

crazygringo · on Aug 12, 2023

Well then you're only really talking about < 0.1% of projects, since a single server and replication and caching will get you as far as you need, even for most social networks.

And if you are needing to massively shard because you're Facebook or Twitter, then it's not much of a hot take at all. But it's also massively oversimplified advice. Because the tradeoffs between joins and denormalization depend entirely on each specific scenario, and have to be analyzed use case by use case. In many cases, joins still win out -- e.g. retrieving the profile name and photo ID for the author of each post being displayed on your screen.

I'm just worried that people without experience will see your advice and think it's a good rule of thumb for their single-server database, because they think joins are scary and you've provided some kind of confirmation.

fastball · on Aug 12, 2023

How is it that your new project can't fit on a single machine?

yafbum · on Aug 12, 2023

Could be plenty of reasons, but say you want to support spikey traffic like whatever is related to events, news, realtime information, etc., or highly seasonal stuff like a school portal or selling flowers on mother's day, you want the ability to scale horizontally very quickly to accommodate spikes

aidos · on Aug 12, 2023

Everyone thinks they start their project with Facebook scale problems.

cuu508 · on Aug 12, 2023

Perhaps they're building an internet search engine.

kaba0 · on Aug 12, 2023

Also, one could just create a materialized view for the join and solve the problem at the proper layer.

blitzar · on Aug 12, 2023

9 out of 10 will fail before they need to scale