The bit on the database performance issues leads me to my hottest, flamiest take...

theptip · on Aug 12, 2023

Honestly I couldn’t disagree more. I built a startup and paid little attention to perf for years 1-5, and finally in year 6 we started to get bitten by some perf issues in specific tables, and spent a few engineer-months optimizing.

In terms of tech debt it would have been way more expensive to make everything perform well from the start, we would have moved much slower and probably failed during a few crunch points.

Instead we paid probably a few $k/mo more than we really needed to on machines, and in return saved man-months of effort at a time when we couldn’t hire enough engineers and the opportunity cost for feature work was huge. (Keep in mind that making everything perform well would have required us to do 10-20x as much work, because we could not know ahead of time where the hot spots would be. Some were surprising.)

Joins may be evil at scale, but most startups don’t have scale problems, at least not at first.

Denormalizing can be a good optimization but you pay a velocity cost in keeping all the copies in sync across changes. Someone will write the bug that misses a denormalized non-canonical field and serves up stale data to a user. It’s usually cheaper (in total cost, ie CapEx+OpEx) to write the join and optimize later with a read-aside cache or whatever, rather than contorting your schema.

otikik · on Aug 12, 2023

> a few $k/mo

Isn’t that the cost of one engineer already?

isityouyesitsme · on Aug 12, 2023

In straight dollars, perhaps yes. But the new servers don't show up and spend 3 to 6 months before accomplishing anything meaningful, don't require sick time which can cause the optimizations to slip, and don't take 3 months to find the right fit for hire.

Part of the cost consideration is deterministic results. I will pay a premium for near-guaranteed good but probably sub-optimal results and will actively avoid betting on people I haven't met and don't know exist.

In my hiring, I hire now to solve problems we expect to hit after 4 quarters. It almost never makes sense to hire anyone into a full-time role for any project in a shorter timeframe. If you were wrong about the specific problems you expect to have in a year, you have a person who is trained in your development environment, tooling, and projects, and you already budgeted to use them in-depth in a year. There's no emergency. There is time to pivot. But if you're wrong about the need to hire someone now full time, you front load all of the risk and if it doesn't work out, you are stuck with an employee you do not need (and stuck is the right word. Have you ever terminated someone? It is harder than you think it is, and I don't mean just for emotional reasons).

Buy hardware over people. Treat the people you have as if the business depends on them. Let them know that it does. Everyone is happier this way.

dragonwriter · on Aug 13, 2023

> > a few $k/mo

> Isn’t that the cost of one engineer already?

Only for very cheap engineers and very large values of “a few”. $120k/year is pretty low total compensation for an engineer (and the cost of an engineer exceeds their total comp because there is also gear, and the share of management, HR, and other support they consume) and amounts to $10k/month.

theptip · on Aug 12, 2023

In the Bay Area, no, an engineer costs an order of magnitude more. (For a round number, think $15-20k/mo including office space, benefits, etc. for a senior engineer; that's perhaps a bit high for the period I'm discussing but it also isn't attempting to price the cost of equity grants. At that time Google was probably spending something like $35-40k/mo (maybe higher, I don't know their office/perk costs) on equivalent talent at SWE5 including the liquid RSU grants.) But of course run the cost/benefit calc for your own cost of labor.

More importantly, it's critical to think in terms of opportunity cost. Like I said, we couldn't hire engineers fast enough at that time, so if I put someone on this work it would be taking them off some other important project. Plausibly for a fast-growing startup that means eschewing work that's worth $1-2m/eng-yr or more (just looking at concrete increases in company valuation, not present value of future gains). So we're talking on the order of $100k/eng-mo opportunity cost.

re-thc · on Aug 12, 2023

> I built a startup and paid little attention to perf for years 1-5, and finally in year 6 we started to get bitten by some perf issues in specific tables, and spent a few engineer-months optimizing.

This screams of if I don’t see it it the problem doesn’t exist view of the world.

How do you know it’s not a problem? Perhaps customers would have signed up if it as faster?

The problem is also treating it in terms of business value and/or cost.

A lot of things are “free” and yet it’s ignored.

For most people, in simple cases like turning on http3, brotli, switching to newer instances and many others are all quick wins that I see ignored 90% of the time.

A good design, implementing some good practices etc are performance specific and don’t always cost more.

kreetx · on Aug 12, 2023

A denormalized database model is considered bad desig to begin with, and has performance costs on its own. This is why the OP says this is a "hot take". :)

Maybe there are situations where this actually helps, although the resulting datastructure to me looks more like a multi-key cache.

sgarland · on Aug 12, 2023

Thank you. I’ve been reading these comments and thinking I’m losing my mind.

crazygringo · on Aug 11, 2023

That is a hot take... ;)

But joins should never impact performance in a large way if they're on the same server and properly indexed. "It's truly amazing how much faster everything is when you eliminate joins" is just not true if you're using joins correctly. Sadly, many developers simply never bother to learn.

On the other hand, having to write a piece of data to 20 different spots instead of 1 is going to be dramatically slower performance-wise, not to mention make your queries tremendously more complex and bug-prone, when you remember to update a value in 18 spots but forget about 2 of them.

You mention cheap storage as an advantage for denormalizing, but storage is the least problem here. It's a vastly larger surface area for bugs, and terrible write performance (that can easily chew up read performance).

jiggawatts · on Aug 12, 2023

Storage might be cheap, but memory and bandwidth aren’t.

Memory is the new disk, and disk is the new tape.

You want everything to remain resident in memory, and spool backups and transaction logs to disk.

If you’re joining from disk, you’ve probably done something wrong.

E.g.: compression is often a net win because while it uses more CPU, it allows more data to fit into memory. And if it doesn’t fit, it reduces the disk I/O required.

This is why I look upon JSON-based document databases in horror. They’re bloating the data out many times over by denormalizing and then expand that into a verbose and repetitive text format.

This is why we have insanity like placeholders for text on web apps now — they’re struggling to retrieve a mere kilobyte of data!

yashap · on Aug 12, 2023

Joins are not inherently expensive, but they can lead to expensive queries. For example, say I want to find the 10 most recent users with a phone number as their primary contact method:

SELECT …

FROM User

JOIN ContactMethod on ContactMethod.userId = User.id

WHERE ContactMethod.priority = ‘primary’ AND ContactMethod.type = ‘phoneNumber’

ORDER BY User.createdAt DESC

LIMIT 10

If there are a very large number of users, and a very large number of phone number primary contacts, you cannot make this query fast/efficient (on most RDBMSes). You CAN make this query fast/efficient by denormalizing, ensuring the user creation date and primary contact method are on the same table, and then creating a compound index. But if they’re in separate tables, and you have to join, you can’t make it efficient, because you can’t create cross-table compound indeces.

This pattern of join, filter by something in table A, sort by something in table B, and query out one page of data, is something that comes up a lot. It’s why ppl thing joins are generally expensive, but it’s more like they’re expensive in specific cases.

zepolen · on Aug 12, 2023

For 10 million users + telephones, this takes 1ms.

    create table users (
        id serial primary key not null,
        created_at timestamp not null default now()
    );

    create table users_telephones (
        user_id int references users(id) not null,
        is_primary boolean not null default true,
        telephone varchar not null
    );

    insert into users
    select i, NOW() + (random() * (interval '90 days')) + '30 days' from generate_series(1, 10000000) i;
    insert into users_telephones select id, true, random() :: text from users limit 10000000; -- all users have a primary telephone
    insert into users_telephones select id, false, random() :: text from users limit 200000; -- some users have a non primary telephone
    create index on users(created_at);
    create index on users_telephones(user_id);
    create index on users_telephones(user_id, is_primary) where is_primary;

    select count(*) from users;
    count   
    ----------
    10000000
    (1 row)

    Time: 160.911 ms


    select count(*) from users_telephones;
    count   
    ----------
    10200000
    (1 row)

    Time: 176.361 ms


    select
        *
    from
        users u
        join users_telephones ut on u.id = ut.user_id
    where
        ut.is_primary
    order by
        created_at
    limit
        10;

    id    |         created_at         | user_id | is_primary |     telephone      
    ---------+----------------------------+---------+------------+--------------------
    9017755 | 2023-09-11 11:45:37.65744  | 9017755 | t          | 0.7182410419408853
    6061687 | 2023-09-11 11:45:39.271054 | 6061687 | t          | 0.3608686654204689
    9823470 | 2023-09-11 11:45:39.284201 | 9823470 | t          | 0.3026398665522869
    2622527 | 2023-09-11 11:45:39.919549 | 2622527 | t          | 0.1929579716250771
    7585920 | 2023-09-11 11:45:40.256742 | 7585920 | t          | 0.3830236472843005
    5077138 | 2023-09-11 11:45:41.076164 | 5077138 | t          | 0.9058939392225689
    1496883 | 2023-09-11 11:45:42.459194 | 1496883 | t          | 0.1519510558344308
    9234364 | 2023-09-11 11:45:42.965896 | 9234364 | t          | 0.8254433522266105
    6988331 | 2023-09-11 11:45:43.130548 | 6988331 | t          | 0.9577098184736457
    7916398 | 2023-09-11 11:45:43.559425 | 7916398 | t          | 0.9681218675498862
    (10 rows)

    Time: 0.973 ms

yashap · on Aug 12, 2023

Ty for benchmarking, but this isn’t a good benchmark for the issue I’m talking about.

This is only fast because 100% of users have a phone number as a primary contact, so the join filter is essentially meaningless. If in the contact table, the filtered number is a small percentage of the total (e.g. most users have an email as their primary contact, not a phone number), but still a good size (e.g. there’s still hundreds of thousands to millions of phone primary contacts), it’s a much harder query.

It’s probably also fast because you have a warm cache - e.g. there’s enough memory for the DB to have the indexes 100% in memory, which is just not feasible with large DBs in the real world, where you can easily have >100GB of indexes + hot data, and the DB can’t keep it all in memory. In most real world scenarios, having to somewhat frequently read pages of indexes off disk, into memory, to satisfy queries, is common.

Try it again, with the exact same data, but:

- Search for users with a non-primary phone contact (you have 200,000 of these, and 10,000,000 users)

- Give the DB say 1/3 the memory of your total index size, so the complete indexes can’t be in memory

- Run the query right after starting PG up, to ensure the cache is cold (with a hot cache, almost everything is fast, but in real world situations with lots of users the cache isn’t consistently hot)

sgarland · on Aug 12, 2023

> It’s probably also fast because you have a warm cache - e.g. there’s enough memory for the DB to have the indexes 100% in memory, which is just not feasible with large DBs in the real world, where you can easily have >100GB of indexes + hot data, and the DB can’t keep it all in memory.

That's the point? Sure there is a scale where it's infeasible, but you can quite easily (albeit it's pricey) get DB instances with hundreds or thousands of GiB of RAM. Even if you can't get everything into it, your working set is often not the size of the entire data set. My company's main MySQL primary has around 800 GiB of storage, and only about 1/3 of that in RAM. Disk read IOPS are usually 0.

Nevertheless, I recreated this in Postgres 15.

The total index size of the two tables is 790 MB according to `\di+*`, so I'll set `shared_buffers` to 263 MB, then restart and re-run.

For reference, time with current settings is about 6.8 msec for `is_primary`, and 1395 msec for `NOT is_primary`.

After a restart, I ran the query for `NOT is_primary` first, which took 1740 msec. The first run of the query with `is_primary` took 21 msec.

My DB is hosted on older hardware, and the files themselves live on NVMe drives presented via Ceph over 1GBe.

EDIT: I forgot that Postgres uses OS cache for a lot of its memory, not just its own. Re-did this, running `sync; sync; sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'` in between shutting down/starting up Postgres. 16726 msec and 79 msec, respectively. So yes, a lot slower, but a. I don't think this is realistic for a production database b. I'm still not clear about how you think JOINs enter into this. The slowdown comes entirely from having to run a table scan.

yashap · on Aug 13, 2023

First off, ty for running all these benchmarks, above and beyond!

FWIW, I don’t think joins are bad, I’m 100% for normalized DB schemas with joins. But I’ve done tonnes of performance work over the past ~10 years, and run into a bunch of real world cases where, when caches are cold (which does happen frequently with large datasets and limited budgets), queries similar to the above (join two tables, read a page of data, sorting on one table and filtering on the other) take 10s of seconds, sometimes even minutes with very large datasets. In those cases, the only solution has been to denormalize so we can create a compound index, with filter key(s) as the prefix, sort key as the suffix, which makes the query consistently fast. I am not at all suggesting this as the default, better to default to normalized with joins, and only do this for these specific cases. Generally a company just has one or a few cases where this is necessary, can just do the ugly denormalization + compound indexes for these few hot spots. But when ppl say “joins are slow”, these cases are examples of where it’s true.

Re: your above 17 second query, if you do something like adding fields to the user table for primary and secondary contact method (denormalizing), and then create a compound index with the necessary filter keys first, and the sort key second, I think you’ll find the query (which no longer needs a join) is quite fast even if caches are ice cold.

sgarland · on Aug 13, 2023

Created a new table that contains `user_id, created_at, phone_primary, phone_secondary`. Inserted all 10,200,000 rows. Notably (I'll come back to this) due to the generation of the rows, the primary key (`user_id`) is an unsorted integer - this was _not_ done with a serial or identity.

  postgres=# CREATE INDEX sec_phone_created_at ON hn_phone_new (phone_secondary, created_at) WHERE phone_secondary IS NOT NULL;

I reset `shared_buffers` down to the same as before - 263 MB - although the size of this index is tiny, < 10 MB, so realistically I can't shrink buffers down that far anyway. I then did the same `sync/drop cache` as before.

  postgres=# SELECT * FROM hn_phone_new WHERE phone_secondary IS NOT NULL ORDER BY created_at LIMIT 10;
     id   |     created_at      |   phone_primary    |  phone_secondary   
  --------+---------------------+--------------------+--------------------
    58816 | 1995-05-23 03:22:02 | +49 030 522866-87  | +1 159-445-4810
    49964 | 1995-05-23 03:23:00 | +61 02 7440 8606   | +254 20 925 892
   171828 | 1995-05-23 05:06:47 | +380 32 393-35-89  | +49 030 429376-29
    78333 | 1995-05-23 05:31:22 | +380 32 147-11-20  | +52 55 6409 5253
    24264 | 1995-05-23 06:47:21 | +44 0131 6506 1823 | +49 030 610965-83
    96662 | 1995-05-23 06:57:03 | +52 55 1473 0538   | +61 02 5414 8204
    15023 | 1995-05-23 07:55:37 | +44 0131 7959 1581 | +44 0131 8491 6194
    52029 | 1995-05-23 08:59:19 | +380 32 430-77-54  | +254 20 374 856
    20518 | 1995-05-23 09:51:14 | +380 32 264-21-79  | +52 55 7787 0236
    80273 | 1995-05-23 14:59:26 | +61 02 8863 4466   | +33 01 16 10 78 56
  (10 rows)

  Time: 2258.807 ms (00:02.259)

So yes, significant improvement as you'd expect. I then dropped the index and swapped the order:

  postgres=# DROP INDEX sec_phone_created_at;
  postgres=# CREATE INDEX created_at_sec_phone ON hn_phone_new (created_at, phone_secondary) WHERE phone_secondary IS NOT NULL;

Reset everything as before, and re-ran the same query:

  Time: 221.392 ms

Thinking that like MySQL, a portion of the `shared_buffers` had been saved to disk and put back in upon restart (honestly I don't know if Postgres does this), I attempted to flush it by running a few `SELECT COUNT(*)` on other, larger tables, then re-running the query.

  Time: 365.961 ms

This is what `EXPLAIN VERBOSE` looks like for the original index:

   Limit  (cost=8.44..8.45 rows=1 width=61)
     Output: id, created_at, phone_primary, phone_secondary
     ->  Sort  (cost=8.44..8.45 rows=1 width=61)
           Output: id, created_at, phone_primary, phone_secondary
           Sort Key: hn_phone_new.created_at
           ->  Index Scan using sec_phone_created_at on public.hn_phone_new  (cost=0.42..8.43 rows=1 width=61)
                 Output: id, created_at, phone_primary, phone_secondary
  (7 rows)

And this is what it looks like for the second, with the columns swapped:

  Limit  (cost=0.42..8.43 rows=1 width=61)
     Output: id, created_at, phone_primary, phone_secondary
     ->  Index Scan using created_at_sec_phone on public.hn_phone_new  (cost=0.42..8.43 rows=1 width=61)
           Output: id, created_at, phone_primary, phone_secondary
  (4 rows)

So it actually needs to be reversed, so that the query planner doesn't have to add a sort step for the ORDER BY.

yashap · on Aug 14, 2023

Yeah, I believe which order is best (sortKey/filterKey or filterKey/sortKey) really depends on the specific data/queries, best to try both and pick the best one - looks like sortKey/filterKey in this case :)

But I think this does show how, for specific data/queries, sometimes you do have to denormalize, so that you can create the ideal compound index, for specific problem queries. Should still go with normalized schemas and joins as a default, but if problem queries pop up like this that are taking 10, 20, 30 seconds sometimes (when caches are cold), compromising a bit on clean schemas/code for performance makes sense.

I also created a benchmark here, for Postgres: https://gist.github.com/yashap/6d7a34ef37c6b7d3e4fc11b0bece7...

sgarland · on Aug 14, 2023

If you’re limited in RAM and can’t upsize, then yes, this does appear to be a good trade off. You can always refactor later and normalize if necessary.

BTW, although it wouldn’t have helped for your specific benchmark schema creation of TYPES, I’ll plug my genSQL tool [0] for generating random data. It’s primarily designed around MySQL, but it can produce CSVs easily, which every DB can load.

Turns out a lot of random() calls in most languages is slow af, so mine avoids that by (mostly) batching them in a C library. Should be able to create a million somethings in under 10 seconds on modern hardware in Python 3.11.

[0]: https://github.com/stephanGarland/genSQL

yashap · on Aug 19, 2023

Looks useful, ty!

aidos · on Aug 12, 2023

Thanks for bothering to work it through, I was too lazy.

But, yeah, exactly. Everyone thinks they need to optimise the life out of this stuff at the beginning but the db can do a lot with normalised data and the appropriate indexes.

Side note - is_primary isn’t required in the partial index itself since they’ll all be “true” due to the where clause.

catalypso · on Aug 12, 2023

Thanks for the effort.

Probably nitpicking but these types of measures are usually tricky to interpret because there is a high chance your indexes (maybe even rows) are still on PostgreSQL shared buffers and OS cache and might not reflect real usage performance.

To get a more "worst-case" measure, after your inserts and indexes creation, you can restart your database server + flush OS pages cache (e.g. drop_caches for Linux), then do the measure.

Sometimes the difference is huge, although I don't suspect it will be in this case.

ElatedOwl · on Aug 12, 2023

imo a properly configured Postgres server should have enough RAM to keep any hot data in cache. The cached path is the accurate measurement.

ghusbands · on Aug 12, 2023

What proportion of users had a primary telephone contact? I think you'd need to be skipping over a lot of users (those without a primary telephone contact) to hit the pathological case that's implied.

sgarland · on Aug 12, 2023

Decided to re-create this in MySQL on fairly old hardware, and with actual phone numbers - the latter shouldn't make a difference since they're still VARCHAR, but I already have a program [0] to generate schema with them, so why not?

I did have to do a few manual updates after data load because the aforementioned program can't make foreign keys yet, and also for bools (which MySQL stores as tinyint(1)) I'm randomly generating them via `id & 1`, which isn't what you had.

Also, I gave `hn_phone` its own auto-increment int as a PK, so I could have a non-unique index on `user_id`. In MySQL, if you create a table without a PK, you get one of these, in descending order of precedence:

* The first indexed UNIQUE NOT NULL column promoted to PK

* An invisible, auto-generated, auto-incrementing integer column called `my_row_id` as PK (MySQL >= 8.0.30, if sql_generate_invisible_primary_key=1)

* A hidden index called `GEN_CLUST_INDEX` created on a super-invisible (i.e. doesn't show up in table definition) column called `ROW_ID`, but that column is shared across the entire DB so please don't do this

It's worth noting that since the first 10,000,000 rows all have `is_primary` set, this can finish extremely quickly. If you invert that match with these tables, you have to do a table scan on `hn_phone`, and the time jumps up to about 5650 msec. If you change the `hn_phone` index to be a composite on (`user_id`, `is_primary`) and then rewrite the query to use a subquery instead of a join, the time drops to around 7 msec. You might see a slight speed-up if you index `created_at` in descending order if that was the normal access pattern.

Anyway:

  OS: Debian Bullseye 5.10.0-23-amd64
  Virtualized: Yes (Proxmox)
  CPU: E5-2650 v2 @ 2.60GHz
  Allocated Core Count: 16
  Allocated RAM: 64 GiB PC3-12800R
  Disk: Samsung PM983 1.92 TiB via Ceph
  Filesystem: XFS
  Mount Options: defaults,noatime
  MySQL Version: 8.0.34
  MySQL Options (non-default):
    innodb_buffer_pool_instances = 16
    innodb_buffer_pool_chunk_size = 134217728
    innodb_buffer_pool_size = 17179869184
    innodb_numa_interleave = 1
    innodb_sync_array_size = 16 # this shouldn't apply here, but listing anyway
    innodb_flush_method = O_DIRECT
    innodb_read_io_threads = 16
    innodb_write_io_threads = 16 # this shouldn't apply here, but listing anyway

  CREATE TABLE `hn_user` (
    `id` int unsigned NOT NULL AUTO_INCREMENT, 
    `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, 
    PRIMARY KEY (`id`), 
    KEY `user_created_at` (`created_at`)
  );

  CREATE TABLE `hn_phone` (
    `id` int unsigned NOT NULL AUTO_INCREMENT, 
    `user_id` int unsigned NOT NULL, 
    `is_primary` tinyint(1) NOT NULL DEFAULT '1', 
    `phone` varchar(255) NOT NULL, 
    PRIMARY KEY (`id`), 
    KEY `user_id` (`user_id`), 
    CONSTRAINT `hn_phone_ibfk_1` FOREIGN KEY (`user_id`) REFERENCES `hn_user` (`id`)
  );

  mysql> SELECT COUNT(*) FROM hn_user UNION SELECT COUNT(*) FROM hn_phone;
  +----------+
  | COUNT(*) |
  +----------+
  | 10000000 |
  | 10200000 |
  +----------+
  2 rows in set (1.20 sec)

  mysql> SELECT 
    u.id, 
    u.created_at, 
    ut.is_primary, 
    ut.phone 
  FROM 
    hn_user u 
    JOIN hn_phone ut ON u.id = ut.user_id 
  WHERE 
    ut.is_primary 
  ORDER BY 
    u.created_at DESC 
  LIMIT 10;

  +---------+---------------------+------------+--------------------+
  | id      | created_at          | is_primary | phone              |
  +---------+---------------------+------------+--------------------+
  | 6906106 | 2023-08-12 06:08:25 |          1 | +61 02 5317 2261   |
  | 6906106 | 2023-08-12 06:08:25 |          1 | +254 20 294 205    |
  | 6738922 | 2023-08-12 06:07:12 |          1 | +61 02 1247 3361   |
  | 6738922 | 2023-08-12 06:07:12 |          1 | +44 0131 8386 4494 |
  | 7449553 | 2023-08-12 06:03:55 |          1 | +61 02 7649 6731   |
  | 7449553 | 2023-08-12 06:03:55 |          1 | +61 02 7893 9835   |
  | 6908862 | 2023-08-12 05:51:52 |          1 | +81 03 6743-6893   |
  | 6908862 | 2023-08-12 05:51:52 |          1 | +44 0131 8414 7888 |
  | 4134961 | 2023-08-12 05:51:42 |          1 | +1 614-908-1719    |
  | 4134961 | 2023-08-12 05:51:42 |          1 | +44 0131 9898 8958 |
  +---------+---------------------+------------+--------------------+
  10 rows in set (0.00 sec)

  mysql> WITH latest_event AS (
    SELECT 
      event_id 
    FROM 
      performance_schema.events_statements_history_long 
    WHERE 
      sql_text LIKE 'SELECT u.id%' 
    ORDER BY 
      event_id DESC 
    LIMIT 1
  ) 
  SELECT 
    event_name, 
    TRUNCATE(
      TIMER_WAIT / POW(10, 9), 
      3
    ) AS 'duration (msec)' 
  FROM 
    performance_schema.events_stages_history_long stg 
    JOIN latest_event ON stg.nesting_event_id = latest_event.event_id 
  UNION 
  SELECT 
    "total", 
    TRUNCATE(
      TIMER_WAIT / POW(10, 9), 
      3
    ) 
  FROM 
    performance_schema.events_statements_history_long stmt 
    JOIN latest_event ON stmt.event_id = latest_event.event_id;

  +------------------------------------------------+-----------------+
  | event_name                                     | duration (msec) |
  +------------------------------------------------+-----------------+
  | stage/sql/starting                             |           0.261 |
  | stage/sql/Executing hook on transaction begin. |           0.003 |
  | stage/sql/starting                             |           0.016 |
  | stage/sql/checking permissions                 |           0.006 |
  | stage/sql/checking permissions                 |           0.005 |
  | stage/sql/Opening tables                       |           0.134 |
  | stage/sql/init                                 |           0.008 |
  | stage/sql/System lock                          |           0.023 |
  | stage/sql/optimizing                           |           0.034 |
  | stage/sql/statistics                           |           0.087 |
  | stage/sql/preparing                            |           0.074 |
  | stage/sql/executing                            |            0.74 |
  | stage/sql/end                                  |           0.003 |
  | stage/sql/query end                            |           0.003 |
  | stage/sql/waiting for handler commit           |           0.025 |
  | stage/sql/closing tables                       |           0.019 |
  | stage/sql/freeing items                        |           0.176 |
  | stage/sql/cleaning up                          |           0.003 |
  | total                                          |           1.654 |
  +------------------------------------------------+-----------------+
  19 rows in set (0.00 sec)

[0]: https://github.com/stephanGarland/genSQL # shameless plug; it's super messy and probably unintuitive, but it's getting better/faster and has been a fun ride learning how fast you can make Python (and when to offload to C).

franckpachot · on Aug 12, 2023

With an index on User (createdAt, id) and one on ContactMethod ( primary,ContactMethod,userId), it should be fast (check that the the execution plan starts with User). Except if lot of recent users have no phones, but that will not be better in a single table (except if columnar storage)

aidos · on Aug 12, 2023

I imagine so too. You’ll be able to interate over the users in order and hit an instant index on contact method.

    select
      *
    from
      user
    where
      exists (
        select
          true
        from
          contact_method cm
        where
          cm.contact_id = contact.id
          and cm.method = 'phone'
          and cm.primary
      )
    order by
      created_at desc
    limit 10

    —- partial index is even faster
    create index primary_phone on contact_method (contact_id) where method = 'phone' and primary;

yashap · on Aug 14, 2023

See my reply here: https://news.ycombinator.com/item?id=37116015

Turskarama · on Aug 12, 2023

In my experience with SQL, a query like that should return in under a second even if you have 100k or more users.

There are some other tricks you can use if you're clever/lucky as well. If you're just using integer IDs (which is reasonable if your system isn't distributed) then you could order by userid on you ContactMethod table and still get the same speed as you would with no join.

yashap · on Aug 14, 2023

See my reply here: https://news.ycombinator.com/item?id=37116015

Zanfa · on Aug 12, 2023

> If there are a very large number of users, and a very large number of phone number primary contacts, you cannot make this query fast/efficient

I think specific numbers would help make this point better. With a few hundred thousand to low millions of users this should be plenty fast in Postgres for example. That’s magnitudes more than most startups ever reach anyway.

yashap · on Aug 14, 2023

See my reply here: https://news.ycombinator.com/item?id=37116015

yashap · on Aug 14, 2023

Lots of replies to this one! I created a little benchmark that you can easily run yourself, as long as you have Docker installed. It shows how, for cases like the one I described above, the only way to have consistently fast queries (i.e. even with a cold cache) is to denormalize, so you can create the ideal compound index. The normalize/join version takes 15x longer, which can be the difference between 1s and 15s queries, 2s and 30s, etc.

The benchmark: https://gist.github.com/yashap/6d7a34ef37c6b7d3e4fc11b0bece7...

Note: I think in almost all cases you should start with a denormalized schema and use joins. But when you hit cases like the above, it's fine to denormalize just for these specific cases - often you'll just have one or a few such cases in your entire app, where the combination of data size/shape/queries means you cannot have efficient queries without denormalizing. And when people say "joins are slow", it's often cases like this that they're talking about - it's not the join itself that's slow, but rather that cross-table compound indexes are impossible in most RDBMSes, and without that you just can't create good enough indexes for fast queries with lots of data and cold caches.

throwdbaaway · on Aug 14, 2023

Nice benchmark script. With EXPLAIN (ANALYZE, BUFFERS), I see that

* normalized/join version needs to read 5600 pages

* normalized/join version with an additional UNIQUE INDEX .. INCLUDE (type) needs to read 4500 pages

* denormalized version only needs to read 66 pages, almost 100x fewer

Related to this pagination use case, when using mysql, even the denormalized version may take minutes: https://dom.as/2015/07/30/on-order-by-optimization/

yashap · on Aug 14, 2023

Ooh ty, will give that article a read! And yeah, that's really the trick to queries that are consistently fast, even with cold caches - read few pages :)

tlarkworthy · on Aug 12, 2023

Yes this is the exact situation where sql falls short. You can't make cross-table indexes to serve OLAP-esq queries, and the most recent X is the common one for pagination in applications. I prefer to denormalize manually at write time in a transaction, rather than use triggers or materialized views.

ako · on Aug 12, 2023

Why do you prefer manually doing this rather than using materialized views? Materialized views seem easier to create and maintain?

tlarkworthy · on Aug 12, 2023

Because they are o(n) complexity to refresh so either you settle for eventual complexity or have expensive writes. By forwarding just the index data to the right table you maintain an consistent idiomatic index at 0(1) write cost

gen220 · on Aug 13, 2023

Have you considered implementing this with database triggers instead of in your application logic?

Requires a bit of brainpower to set up a system around it, but it makes your application logic dramatically simpler.

(You don't have to remember to update `foo.bar` every time you write a function that touches `moo.bar`, and if you run migrations in SQL, the updates will also cascade naturally).

It's really high up on my personal wish list for Postgres to support incrementally-materialized views. It doesn't seem like it would be impossible to implement (since, as I suggested, you can implement it on your own with triggers), but IDK, I assume there are higher-priority issues on their docket.

tlarkworthy · on Aug 14, 2023

considered and rejected because triggers complicate other database operations like restores. Triggers to me are like global operations, they really make a strong statement about storage which does not tend to be true over longer time horizons when you consider all the stackeholders of the data storage system. They make sense as an application feature, but they are implemented as globals, and this is where the problems begin.

aidos · on Aug 12, 2023

Agreed. Matarialized views will be limited in use until we get nice incremental updating versions.

unnouinceput · on Aug 12, 2023

If the tables involved in the join are of 100M+ records what I do when the joins use varchar columns to improve the performance is to use an additional integer column of the varchar one that is a CRC of it (or hash if you prefer that) and use the integer one instead in the join.

aidos · on Aug 12, 2023

That seems weirdly convoluted. So you store two columns to represent the foreign key?

unnouinceput · on Aug 12, 2023

If you use a varchar as FK then you're definitely doing something wrong from beginning. OP was talking about getting the phone number under certain conditions, and a phone number column is a varchar.

aidos · on Aug 12, 2023

I don’t think they ever did a lookup on phone number and even if you needed to, you would just index it and it’d be fast. You can even use things like tri-gram indexes to give you super fast partial phone number matching.

endisneigh · on Aug 12, 2023

I agree but I’m talking in the context where you can’t vertically scale anymore.

I also don’t think it’s worth the trouble “never using joins” for an existing project. Denormalize as necessary. But for a green one I honestly think since our access patterns can be understood as you continue you can completely get rid of joins.

Again, assuming your new project can’t fit on a single machine. If it can you’re best just following the “traditional” advice, or better yet keep everything in memory.

crazygringo · on Aug 12, 2023

Well then you're only really talking about < 0.1% of projects, since a single server and replication and caching will get you as far as you need, even for most social networks.

And if you are needing to massively shard because you're Facebook or Twitter, then it's not much of a hot take at all. But it's also massively oversimplified advice. Because the tradeoffs between joins and denormalization depend entirely on each specific scenario, and have to be analyzed use case by use case. In many cases, joins still win out -- e.g. retrieving the profile name and photo ID for the author of each post being displayed on your screen.

I'm just worried that people without experience will see your advice and think it's a good rule of thumb for their single-server database, because they think joins are scary and you've provided some kind of confirmation.

fastball · on Aug 12, 2023

How is it that your new project can't fit on a single machine?

yafbum · on Aug 12, 2023

Could be plenty of reasons, but say you want to support spikey traffic like whatever is related to events, news, realtime information, etc., or highly seasonal stuff like a school portal or selling flowers on mother's day, you want the ability to scale horizontally very quickly to accommodate spikes

aidos · on Aug 12, 2023

Everyone thinks they start their project with Facebook scale problems.

cuu508 · on Aug 12, 2023

Perhaps they're building an internet search engine.

kaba0 · on Aug 12, 2023

Also, one could just create a materialized view for the join and solve the problem at the proper layer.

blitzar · on Aug 12, 2023

9 out of 10 will fail before they need to scale

feoren · on Aug 11, 2023

There are "tall" applications and "wide" applications. Almost all advice you ever read about database design and optimization is for "tall" applications. Basically, it means that your application is only doing one single thing, and everything else is in service of that. Most of the big tech companies you can think of are tall. They have only a handful of really critical, driving concepts in their data model.

Facebook really only has people, posts, and ads.

Netflix really only has accounts and shows.

Amazon (the product) really only has sellers, buyers, and products, with maybe a couple more behind the scene for logistics.

The reason for this is because tall applications are easy. Much, much easier than wide applications, which are often called "enterprise". Enterprise software is bad because it's hard. This is where the most unexplored territory is. This is where untold riches lie. The existing players in this space are abysmally bad at it (Oracle, etc.). You will be too, if you enter it with a tall mindset.

Advice like "never user joins" and "design around a single table" makes a lot of sense for tall applications. It's awful, terrible, very bad, no-good advice for wide applications. You see this occasionally when these very tall companies attempt to do literally anything other than their core competency: they fail miserably, because they're staffed with people who hold sacrosanct this kind of advice that does not translate to the vast space of "wide" applications. Just realize that: your advice is for companies doing easy things who are already successful and have run out of low-hanging fruit. Even tall applications that aren't yet victims of their own success do not need to think about butchering their data model in service of performance. Only those who are already vastly successful and are trying to squeeze out the last juices of performance. But those are the people who least need advice. This kind of tall-centered advice, justified with "FAANG is doing it so you should too" and "but what about when you have a billion users?" is poisoning the minds of people who set off to do something more interesting than serve ads to billions of people.

DylanDmitri · on Aug 11, 2023

For Amazon, if we consider everything for website retail purchases, I would estimate tens of thousands of table schemas. This is counting:

- Sellers (Amazon, third party, retail store) - Inventory (forecasting, recommendation) - Customers (comments, ratings, returns, preferences) - Warehouses (5+ distinct types, filled with custom machines) - Transit Options (long haul, air, vans, cars, bikes, walking, boats) - Delivery Partners (DSP, Flex, Fedex, UPS - forecasting capacity here) - Routing (between warehouses, within warehouses, to specific homes) - skipping AWS - skipping billing - skipping advertising on amazon.com (bidding, attribution, etc)

There's optimizations and metrics collected and packages transition between all these layers. There's hundreds of "neat projects" running to special case different things; all them useful but adding complexity.

For example ordering prescriptions off Amazon pharmacy needs effectively its own website and permissions and integrations. Probably distinct sorting machines with supporting databases for them. Do you need to log repairs on those machines? Probably another table schema.

You want to normalize international addresses? And fine tune based on delivery status and logs and customer complaints and map data? Believe it not like 20 more tables. Oh this country has no clear addresses? Need to send experienced drivers to areas they already know. Need to track that in more tables.

r00fus · on Aug 11, 2023

But these aren't the website, right? Amazon runs "wide" enterprise systems in the back end for sure.

DylanDmitri · on Aug 12, 2023

Oh yeah these are "wide" enterprise systems, and not for the website. I think we read the initial comment differently.

donavanm · on Aug 12, 2023

I apologize up front if I completely misunderstand your intent. However ...

> Amazon (the product) really only has sellers, buyers, and products, with maybe a couple more behind the scene for logistics.

Is a comically bad hot take that is so entirely divorced from reality. A full decade ago the item catalog (eg ASINs or items to purchase) alone had closer to 1,000 different subsystems/components/RPCs etc for a single query. I think you'd have to go back to circa 2000 before it could be optimistically described as a couple of databases for the item catalog.

DylanDmitri sibling comment is a hell of a lot closer to the truth, and I'd hazard is still orders of magnitude underestimating what it takes to go from viewing an item detail page to completing checkout, let alone picking or delivery. Theres a reason the service map diagram, again circa 2010, was called "the deathstar."

> "FAANG is doing it so you should too" and "but what about when you have a billion users?" is poisoning the minds of people

This part I completely agree with. And many individual components in those giant systems are dead simple. I dare say the best ones are simplistic even.

strgcmc · on Aug 12, 2023

Ex-Amazonian here, and while I agree with the facts you present, I do think the "tall" vs "wide" debate is being misapplied here.

Amazon is extremely and perversely obsessed with, and good at, building decoupled systems at scale, which in essence means lots and lots of individual separate "tall" systems, instead of monolithic "wide" systems.

So IMO, Amazon subscribes to a "forest-of-'tall'-services" philosophy. And even at that meta level, I would say the forests are better off when they grow taller, rather than wider.

wruza · on Aug 12, 2023

This kind of tall-centered advice, justified with "FAANG is doing it so you should too" and "but what about when you have a billion users?" is poisoning the minds of people

The world runs on success stories, not on technology. I wish “wide” thinking was default, for both un-delusion and better development in this area. But everyone is amazed with facebook (not the site, just money), so they have to imitate it, like those tribes who build jets out of wood.

pipe_connector · on Aug 11, 2023

I agree with the characterization of applications you've laid out and think everyone should consider whether they're working on a "tall" (most users use a narrow band of functionality) or a "wide" (most users use a mostly non-overlapping band of functionality) application.

I also agree with your take that tall applications are generally easier to build engineering-wise.

Where I disagree is that I think in general wide applications are failures in product design, even if profitable for a period of time. I've worked on a ton of wide applications, and each of them eventually became loathed by users and really hard to design features for. I think my advice would be to strive to build a tall application for as long as you can muster, because it means you understand your customers' problems better than anyone else.

feoren · on Aug 11, 2023

> I've worked on a ton of wide applications, and each of them eventually became loathed by users and really hard to design features for.

Yes, I agree that this is the fate of most. But I refuse to believe it's inevitable; rather, I think it comes from systemic flaws in our design thinking. Most of what we learn in a college database course, most of what we read online, most all ideas in this space, transfer poorly to "wide" design. People don't realize this because those approaches do work well for tall applications, and because they're regarded religiously. This is why I call them so much harder.

lll-o-lll · on Aug 12, 2023

> Yes, I agree that this is the fate of most. But I refuse to believe it's inevitable

Yes exactly. It is not inevitable, I’ve worked on several “enterprise” software suits that did not suffer from this problem. However! They all had that period in their history where they did, and this is why:

Early on in a companies history there will be a number of “big” customers from whom most of the revenue is coming. To keep those customers and money flowing, often bespoke features are added for these customers and these accumulate over time. This is equivalent in character to maintaining several forks of an OSS project. Long term no forward progress can be made due to all time ending up in maintenance.

The solution to this sorry state is to transition to an “all features must be general for the product” and ruthlessly enforce this. That will also mean freezing customer specific “branches” and there will be a temporary hit to revenue. Customers need to be conditioned to the “no bespoke features” and they need to be sold on the long term benefits and be brought along for the ride.

This then enables massive scaling benefits, and the end of all your time in maintenance.

xyzzy123 · on Aug 11, 2023

Thanks I think this is a really interesting way to look at things.

What is the market for "wide" applications though? It seems like any particular business can only really support one or two of them, for some that will be SAP and for others it might be Salesforce (if they don't need much ERP), or (as you mentioned) some giant semi homebrewed Oracle thing.

Usually there is a legacy system which is failing but still runs the business, and a "next gen" system which is not ready yet (and might never be, because it only supports a small number of use cases from the old software and even with an army of BAs it's difficult to spec out all the things the old software is actually doing with any accuracy).

Or am I not quite getting the idea?

feoren · on Aug 11, 2023

I think you're getting the idea -- both your points kinda highlight that this is something that companies want, but are not really getting.

As for the market, various sources have the "enterprise software market", whatever that means, at somewhere around $100 billion to $300 billion. We also see companies trying over and over to do this kind of thing. The demand is clearly there.

Certainly the mandate "help run the business" is a wide concern, and that's an OK working definition of "enterprise", and what most existing solutions are trying to do. There are hundreds of interconnected concerns, lots of things to coordinate, etc.

There are other wide concerns, though. Almost anything in engineering and science. Take, for example, the question "how can we reduce our greenhouse gas emissions?" which a lot of companies are asking (or being forced to ask). If you wanted to build a SAAS product for helping companies reduce their GHG, you've got a wide problem, because there are a thousand activities that can emit GHG, and any given company is going to be doing dozens of them at once. But each company is different. Each state and country thinks of things differently. You might not even have the same calculations state-to-state.

Hard problems in science and engineering are just naturally cross-disciplinary, meaning your system has to know a lot of things about a lot of subjects. There are just thousands of little complicating differences and factors. If you're trying to solve a problem like this, absolutely do not de-normalize your database.

other_herbert · on Aug 11, 2023

Lotus notes is wide… I imagine their scope creep checker was just a sticky note that said Absolutely!!

lanstin · on Aug 11, 2023

I miss notes - it was really a better way to organize companies than anything later. Historical valuable data, records of why decisions were made, ephemeral email like things but for groups, user programmable if it didn't quite match your needs, robust encryption, it had it all.

lifeisstillgood · on Aug 12, 2023

oh I always nust assumed Lotus Notes was just lesser Outlook. can you give examples - such has how did it capture why decisions were made - that sounds ... hard or just "someone wrote it down"

anonzzzies · on Aug 12, 2023

It was a low/nocode environment; anyone (with enough rights) could knock up a simple app with rules/workflows and share it with the company. It made collecting, distributing and organising information easy if you knew what you were doing. It also created complex monsters as it was both too easy and too hard to use. I liked it a lot; we moved from Notes to Exchange and Sharepoint back in the day and it was awful for effiency. We required so much more people to do the same things. Luckily I left shortly after.

lifeisstillgood · on Aug 12, 2023

Oh.

I struggle with the value of low/no code vs learning to code and providing common libraries.

lanstin · on Aug 13, 2023

For your company you have a lot of smart people other than coders. And Notes had a rich collaborative set of intrinsically that you could hip out work flow applications like an accountant with spreadsheets. And built in security and auditing and all that. And since you had the ability to craft tools to fit the exact situation, automation of processes went so fast and was done by people familiar with the business side of the process. We did have a Notes team that would do apps for teams that couldn’t but also had a rich ecosystem of business line apps that were so much better than spreadsheet apps or Access apps.

ozim · on Aug 12, 2023

Facebook was done by Zuckerberg without any specialized knowledge and people would like to go that route because it seems easier, making Twitter/FB/Instagram clone you don't really have to know anything about insurances or handling industrial waste. Then it is basically people joining based on other people

Nowadays there are bunch of regulations on handling user data that one cannot do without knowing but when these companies started that was not an issue.

My point is market for "wide" applications is huge but it is much more fragmented. Of Course SAP and Salesforce are taking cut in that by having "one app for everything"

To get contracts you have to have specialized knowledge in specific area that your SaaS app would provide more value than configuring some crappy version in SAP. So you cannot just make an app in your basement and watch people sign up, but you have to spend a lot of leg work getting customers. That is why it is not really "hot" area for startups, because there is a lot of good money there but not unicorn money and most likely you won't be able to have 2 or 3 different specialist niche products so you could diversify investment but you would have to commit to a niche which makes it also not really interesting for a lot of entrepreneurs who most likely would lie to jump to something more profitable when possible.

pravus · on Aug 12, 2023

> What is the market for "wide" applications though?

Just my experience, but essentially these target industries, not necessarily consumers or singular entities. Hence the term "enterprise". As someone who worked on a fairly reasonable ERP for academic purposes, even just calculating a GPA is extremely complicated in the backend:

    * There are multiple schemes for calculating GPAs
    * Each scheme needs to support multiple grading types (A-F, pass/fail, etc)
    * Each scheme needs to support multiple rounding rules
    * Displays of GPAs will need to be scaled properly based on the output context
    * GPA values will need to be normalized for use in calculations in other parts of the system
    * State legislatures mandate state-specific usages of GPAs which must be honored for legal compliance
    * All GPA calculations must have historical context in case the rules changes so that old transcripts can be revived correctly
    * Institutions themselves will have custom rules (maybe across schools or departments) for calculations which must be incorporated into everything else
    * This pretty much has to work every time

I don't know exactly how many tables GPAs themselves took, but overall the system was over 4,000 tables and 10,000+ stored procedures/functions. Also, I worked in the State of Texas which has its own institution-supported entity performing customizations to this ERP for multiple universities that are installed separately but required for full compliant operation.

I would compare this to most modern "tall" applications which would more-than-likely offer you maybe up to 3 different GPA options with some basic data syncing or something. They might offer multiple rounding types if they thought that far. These apps are generally extremely niche and typically work for very basic workloads. They can capture a lot of easy value for entry-level stuff but immediately fail at everything else.

evanelias · on Aug 12, 2023

Your initial premise is flawed though. For example, as someone who worked on Facebook's database team, I can tell you that Facebook has thousands upon thousands of tables (distinct logical table definitions, i.e. not accounting for duplication from physical sharding or replication).

Some of these store things for user-facing product entities and associations between them -- you missed the vast majority of product functionality in your "people, posts, and ads" claim. Others are for internal purposes. Some workloads use joins, others do not.

Nothing about Facebook's database design is "tall", nor is it "easy". There are a lot of huge misconceptions out there about what Facebook's database architecture actually looks like!

Advice like "never user joins" and "design around a single table" is usually just bad advice for most applications. It has nothing to do with Facebook, and ditto for Amazon based on the sibling replies from Amazon folks.

yafbum · on Aug 12, 2023

Doesn't that also say something like, it's an easier road to success if you find the tall way to market and scale, scale, scale once you find it? What is the "wide" success story to take inspiration from?

other_herbert · on Aug 11, 2023

That’s awesome …. I’ve gotta remember this when our PO’s want to add things that have absolutely no business being in our software…

endisneigh · on Aug 11, 2023

I agree with your sentiment, but even enterprises work on multiple “tall” features.

If they didn’t then I’d change my advice to be simply multi tenant per customer and replicate into a column store for cross customer analytics.

What advice would you give for a “wide” application?

poidos · on Aug 12, 2023

Very insightful, thank you for writing this.

macNchz · on Aug 11, 2023

Over the years I think I've encountered more pain from applications where the devs leaned on denormalization than from those that developed issues with join performance on large tables.

You can mash those big joins into a materialized view or ETL them into a column store or whatever you need to fix performance later on, but once someone has copied the `subtotal_cents` column onto the Order, Invoice, Payment, NotificationEmail, and UserProfileRecentOrders models, and they're referenced and/or updated in 296 different places...it's a long road back to sanity.

wizofaus · on Aug 11, 2023

Can't say I've ever come across a scenario where a join itself was the performance bottleneck. If there's any single principle I have observed is "don't let a table get too big". More often than not it's historical-record type tables that are the issue - but the amount of data you need for day-to-day operations is usually a tiny fraction of what's actually in the table, and you're bound to start finding operations on massive tables get slow no matter what indexes you have (and even the act of adding more indexes becomes problematic. And just indexing all columns isn't enough for traditional RMDBSes at least - you have to index the right combinations of columns for them to be used. Might be different for DynamoDb).

tremon · on Aug 12, 2023

don't let a table get too big

I'd amend that to "don't let your scan coverage get too big". Understanding how much data must be loaded in memory and compared is essential to writing performant database applications. And yes, those characteristics change over time as the data grows, so there may be a one-size-fits-all solution. But "table too large" can pretty much always be solved by adding better indexes or by partitioning the table, and making sure common queries (query's?) hit only one partition.

As a simple example: a lot of queries can be optimized to include "WHERE fiscal_year = $current". But you need to design your database and application up front to make use of such filtered indexes.

wizofaus · on Aug 12, 2023

If your primary issue is read/query performance, then sure, well-designed indexes, partitioning and, as you say, carefully constructed WHERE clauses are often enough to maintain decent performance even with millions of records. But if you then to do deletions or bulk updates or schema transformations you're in for some serious downtime.

pixl97 · on Aug 11, 2023

In enterprise clients you commonly run into issues where the company thinks you have to save all data forever. Very often this runs in a pattern where the application is lightly used a first then uptake increases over time. Then you run into the slowness issue. They typically expand DB sizing, bit eventually run into the problem where archiving is needed. This can be a huge problem when it's an after thought instead of a primary design. All kinds of fun when you have key relationships between different tablea.

wizofaus · on Aug 12, 2023

It's more that once it gets to a certain size (say, 100s of 1000s of rows), doing anything with the table is painfully slow, often requiring you to take your application offline for considerable periods. Even deleting rows can take 10s of minutes at a time, and it can certainly take a very long time to work out what indexes need to be added and whether they're actually helping.

Yes, sometimes the pressure comes from management etc., but more often than not it would be premature optimisation to add the archiving, so it's a matter of finding a balance and "predicting" at what point the archiving needs to happen.

Table partitioning can help too but only so much.

8note · on Aug 11, 2023

Dynamo is quick for that, so long as you are picking good partition keys.

Instead, it'll throw you hot key throttling if you start querying one partition too much

latchkey · on Aug 11, 2023

> Design your application's hot path to never use joins.

Grab (uber of asia) did this religiously and it created a ton of friction within the company due to the way the teams were laid out. It always required one team to add some sort of API that another team could take advantage of. Since the first team was so busy always implementing their own features, it created roadblocks with other teams and everyone started pointing fingers at each other to the point that nothing ever got done on time.

Law of unintended consequences

tedunangst · on Aug 11, 2023

Hard to follow the link. How would you join two tables between teams that don't communicate?

latchkey · on Aug 11, 2023

You don’t, that’s the problem.

endisneigh · on Aug 11, 2023

yes, this is a fair point. there's no free lunch after all. without knowing more about what happened with Grab I'd say you could mitigate some of that with good management and access patterns, though.

latchkey · on Aug 11, 2023

All in all though, I don't think that 'never use joins' is a good solution either since it does create more developer work almost every way you slice it.

I think the op's solution of looking more closely at the hot paths and solving for those is a far better solution than re-architecting the application in ways that could, or can, create unintended consequences. People don't consider that enough, at all.

Don't forget that hot path resolution is the antithesis of 'premature optimization'.

> you could mitigate some of that with good management and access patterns

the CTO fired me for making those sorts of suggestions about better management, and then got fired himself a couple months later... ¯\_(ツ)_/¯... even with the macro events, their stock is down 72% since it opened, which doesn't surprise me in the least bit having been on the inside...

jodrellblank · on Aug 12, 2023

Grab was the company Steve Yegge left Google to go to. He quit during COVID when he could no longer travel to Asia, and his retrospective is glowing, including of the CTO Mark Porter(I think?): "frankly, the “Grab Way” of collaboration teaches you life skills, such as psychological safety and inclusiveness, which work everywhere else as well. [...] We US Grabbers made many mistakes on the journey towards becoming better Grabbers. Mark Porter led the charge on figuring it out, and we’ve learned a ton from him as he has evolved as a person before our eyes [...] I want to thank Mark especially for helping me grow as a leader and as a person. It takes humility to become a better person, and Grab is a humbling place. The core company value of humility comes straight from the CEO and cofounder Anthony Tan, who is one of the most humble, down-to-earth people you’ll ever meet.".

Is that, uhh, rose-tinted glasses?

2020: https://steve-yegge.medium.com/saying-goodbye-to-the-best-gi...

Previously, 2018: https://steve-yegge.medium.com/why-i-left-google-to-join-gra...

[I don't have any knowledge of Grab or the people involved other than reading these two blog posts, or of the parent commenter].

latchkey · on Aug 12, 2023

Sorry, I gave the wrong role, it was this guy... VP of Eng [0] that fired me and then left for 'personal reasons'... he was fired. Classic incompetent VP Eng manager who didn't know anything about computers. Note how he never really found another position of equal stature as Grab...

I came along as an aquihire of a Vietnamese team that I just happened to be managing at the time. Great team of people. I negotiated a stupid high salary, probably because they weren't used to someone negotiating at all (highest eng in all of Singapore) and that was part of why they were upset at me, I was making more than that VP of Eng and stirring the pot with comments about their poor architecture decisions.

Yegge was a good hire, but probably wrong company for him given the political differences. I think Yegge started a bit after I was fired. I remember thinking to myself that he's either not going to be very effective or he won't last long. To his credit, I think he lasted longer than I would have bet he would have. They had had another ex FB CTO much earlier before me, that was a train wreck [1] and ended up suing the company. They were without a CTO for a long time, probably thought Yegge could fill that roll and ended up hiring Mark instead.

There was a definite distinction between the Singapore and US/Seattle teams, at least the short time while I was there, they pretty much didn't talk to or like each other at all. It made getting those API calls almost impossible.

[0] https://www.techinasia.com/kumaravel-leaves-grab

[1] https://www.techinasia.com/wei-zhu-leaves-grabtaxi

taylodl · on Aug 11, 2023

My hot take: always use a materialized view or a stored procedure. Hide the actual, physical tables from the Application's account!

The application doesn't need to know how the data is physically stored in the database. They specify the logical view they need of the data. The DBAs create the materialized view/stored procedure that's needed to implement that logical view.

Since the application is never directly accessing the underlying physical data, it can be changed to make the retrieval more efficient without affecting any of the database's users. You're also getting the experts to create the required data access for you in the fastest, most efficient way possible.

We've been doing this for years now and it works great. It's alleviated so many headaches we used to have.

wvenable · on Aug 12, 2023

"Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowchart; it'll be obvious." -- Fred Brooks, The Mythical Man Month (1975)

> The application doesn't need to know how the data is physically stored in the database.

In all the applications that I've designed, the application and the database design are in sync. That's not say that you wouldn't use materialized views to deal with certain broad queries but I just don't see how this level of abstraction would make a big difference.

taylodl · on Aug 14, 2023

There's the physical data model, and there's the logical data model. The application(s) only deal with the logical data model. They don't need to worry about how the data is physically stored.

This allows for the forward-compatible evolution of the logical data model which may necessitate extreme changes to the physical data model to keep everything performant. The client application(s) aren't affected by all the changes.

wvenable · on Aug 14, 2023

The application usually has a logical data model in the form of objects representing the data, which, in turn stores that data to the physical model. This could be a separate middle tier layer or not. You're proposing another logical model in the database and I don't see the advantage.

These models shouldn't be significantly different from each other. How you store the data physically should be is very much how it's represented logically. And then how the UI represents the data model to the user.

If you have 2 tables -- a summary and a detail for example -- that's going to be similarly represented to the user in the UI. If you break that summary down into another table (because you need to allow, say, more addresses) you're going to bubble that right up the UI for saving and loading. For querying, you might have a materialized view that summarizes the detail data when viewing a bunch of summaries together but that's in addition to rest of the model.

I'm going to need some kind of example to understand the advantage of this.

walterbell · on Aug 11, 2023

Interface contracts and indirection FTW.

2011, "Materialized Views" by Rada Chirkova and Jun Yang, https://dsf.berkeley.edu/cs286/papers/mv-fntdb2012.pdf

> We cover three fundamental problems: (1) maintaining materialized views efficiently when the base tables change, (2) using materialized views effectively to improve performance and availability, and (3) selecting which views to materialize. We also point out their connections to a few other areas in database research, illustrate the benefit of cross-pollination of ideas with these areas, and identify several directions for research on materialized views.

lanstin · on Aug 12, 2023

That's how it was at AOL. But also, in general, people who wrote C code never designed SQL stuff. We'd come up with some requirements, meet with a DBA (who at the time got paid a lot more money, I always assumed because their work was inherently dull), they'd put together stored procedures for use to call, and then do whatever on the physical table side. They did sometimes change the physical table stuff without us having to change anything (not sharding tho, that was edit a TCL file and restart when they said to restart).

downWidOutaFite · on Aug 11, 2023

This doesn't work because DBAs are rarely on the dev team's sprint schedule. If the DBAs are blocking them devs can and will figure out how to route around the gatekeepers. In general, keep the logic in the app not the db.

icedchai · on Aug 12, 2023

Also, many companies don't even have DBAs these days. DBA is, at best, a part time job for a senior+ engineer.

taylodl · on Aug 11, 2023

We have sprints. We also have super responsive DBAs. Keeping the logic in the app is the path to unresponsive database calls. Been there, done that. Not going back to that crap.

In all seriousness, I won't work for an organization that works the way you describe. It's a red flag and a sign of organizational issues, personality issues, and ineffective management. Don't need to waste my time at a place like that.

porridgeandrice · on Aug 12, 2023

Yeah, I like doing this too. Not _always_, but for a few things. I use it to emulate partial functions.

In a show hosting/ticket booking app for example, I never want in any case user facing search/by-id endpoints to serve a show from 2 months ago. So I create a view `select * from shows where time > now`. I can now use this as a 'table' and apply more filters and joins to this if I wish.

alfor · on Aug 11, 2023

But for the saves the structure is visible?

taylodl · on Aug 11, 2023

You can update underlying data via a materialized view.

NortySpock · on Aug 12, 2023

Heck, at least with SQL Server 2017, I've been able to write updates to a thin view, and so long as the columns referenced were not ambiguous, the database would handle the underlying tables and locks required.

tibbetts · on Aug 11, 2023

Premature denomalization is expensive complexity. Denormalization is a great tool, maybe an under-used tool. But you should wait until there are hot paths before using it.

endisneigh · on Aug 11, 2023

I agree. To be clear I'm not suggesting anyone start denormalizing everything. I'm saying if you're fortunate enough to be on a green project, you should design the schema around the access patterns which will surely be "denomarlized." as opposed to designing a normalized schema and designing your access patterns around those.

chaxor · on Aug 12, 2023

Thanks, I hate it.

This seems close to the territory of "why do I need a database? I just keep a bunch of text files with really long names that described exactly what I did to compute the file. They're all in various directories, so if you need to find one just do some greps and finds on the whole system"

I recognize there's a big gap, but boy howdy does what you're suggesting sound messy.

paulddraper · on Aug 12, 2023

> Design your application's hot path to never use joins

I had a Chief Architect who decreed this.

So engineers wound up doing joins in application code, with far worse performance, filtering, memory caching, etc.

iamwil · on Aug 11, 2023

If you don't use joins, how do you associate records from two different tables when displaying the UI? Do you just join in the application? Or something else?

endisneigh · on Aug 11, 2023

this has opinionated answers.

if you ask Amazon, they might suggest that you design around a single table (https://aws.amazon.com/blogs/compute/creating-a-single-table...).

in my opinion it's easier to use join tables. which are what are sometimes temporarily created when you do a join anyways. in this case, you permanently create table1, table2, and table1_join_table2, and keep all three in sync transactionally. when you need a join you just select on table1_join_table2. you might think this is a waste of space, but I'd argue storage is too cheap for you to be thinking about that.

that being said, you really have to design around your access patterns, don't design your application around your schema. most people do the latter because it seems more natural. what this might mean in practice is that you do mockups of all of the expected pages and what data is necessary on each one. then you design a schema that results in you never having to do joins on the majority, if not all, of them.

r00fus · on Aug 12, 2023

Would it be possible to simply use a materialized view for table1_join_table2?

Izkata · on Aug 12, 2023

The problem there would be how they fall out of date, and updating them is a heavy operation.

An alternative I've heard before is using triggers with regular tables, so updating one automatically updates the relevant other ones.

r00fus · on Aug 15, 2023

Materialized views allow insert/update mechanics on many platforms. Alternatively you can refresh them.

sainez · on Aug 11, 2023

> what this might mean in practice is that you do mockups of all of the expected pages and what data is necessary on each one. then you design a schema that results in you never having to do joins on the majority, if not all, of them.

Great suggestion! I had a role where I helped a small team develop a full stack, data-heavy application. I felt pretty good about the individual layers but I felt we could have done a better job at achieving cohesion in the big picture. Do you have any resources where people think about these sorts of things deeply?

walterbell · on Aug 11, 2023

2001, "Denormalization effects on performance of RDBMS", by G. L. Sanders and Seungkyoon Shin, https://www.semanticscholar.org/paper/Denormalization-effect...

> We have suggested using denormalization as an intermediate step between logical and physical modeling, to be used as an analytic procedure for the design of the applications requirements criteria ... The guidelines and methodology presented are sufficiently general, and they can be applicable to most databases ... denormalization can enhance query performance when it is deployed with a complete understanding of application requirements.

PDF: https://web.archive.org/web/20171201030308/https://pdfs.sema...

endisneigh · on Aug 11, 2023

yeah, exactly. in my experience the vast majority of access patterns are designed around a normalized schema, where it really should be that the schema is designed around the access patterns and generously "denormalize" (which doesn't make sense in this context of a new database) as necessary.

joshstrange · on Aug 11, 2023

Single Table Design is the way forward here. I can highly recommend The DynamoDB Book [0] and anything (talks, blogs, etc) that Rick Houlihan has put out. In previous discussions the author shared a coupon code ("HACKERNEWS") that will take $20-$50 off the cost depending on the package you buy. It worked earlier this year for me when I bought the book. It was very helpful and I referred back to it a number of times. This github repo [1] is also a wealth of information (maintained by the same guy who wrote the book).

As an added data point I don't really like programming books but bought this since the data out there on Single Table Design was sparse or not well organized, it was worth every penny for me.

[0] https://www.dynamodbbook.com/

[1] https://github.com/alexdebrie/awesome-dynamodb

deely3 · on Aug 11, 2023

And if you don't want to spend money, you can get idea from this article:

https://www.alexdebrie.com/posts/dynamodb-single-table/

Im really curious about real life performance on different databases, especially in situation where RAM is smaller than database size.

wizofaus · on Aug 11, 2023

That article didn't appear to be suggesting single-table design was appropriate for general purpose RMDBSes (or any database other than DynamoDb).

Scarbutt · on Aug 11, 2023

Normalization is not only about data storage but most importantly, data integrity.

endisneigh · on Aug 11, 2023

Yes, but I assert that it's possible to use transactions to update everything consistently. Serializable transactions weren't really common when MySQL/Postgres first came out, but now that they're common in new DBs + ACID, I think it's not possible to do with reasonable difficulty. If you agree with this, than its easy to prove that denormalized tables performance increase is well worth the annoyance of updating everything to transactionally update the dependencies.

I won't say that it's trivial to update all of your business logic to do this, but I think it's definitely worth it for a new project at least.

Bognar · on Aug 11, 2023

Denormalized transactions are not trivial unless you are using serializable isolation level which will kill performance. If you don't use serializable isolation level, then you risk either running into deadlocks (which will kill performance) or inconsistency.

Decent SQL databases offer materialized views, which probably give you what you want without all the headache of maintaining denormalized tables yourself.

endisneigh · on Aug 11, 2023

all fair points, but to be fair I don't necessary think this makes the most sense for an existing project for the reasons you state. I do think for a new project would best be able to design around the access patterns in a way that eliminate most of the downsides.

williamdclt · on Aug 11, 2023

Transactions are not only (actually mainly not) about atomicity. Of course it’s possible to keep data integrity without normalisation, but that means you need to maintain the invariants yourself at application level and a big could result in data inconsistency. Normalisation isn’t there to make integrity possible, it’s there to make (some) non-integrity impossible.

Nobody says you have to have only one view of your data though. You can have a normalised view of your data to write, and another denormalised for fast reads (you usually have to, at scale). Something like event sourcing is another way (which is actually pushing invariants to application level, in a structured way)

sgarland · on Aug 12, 2023

> that means you need to maintain the invariants yourself at application level

Foreign Keys.

Of course, now you have a new set of problems, but referential integrity isn't one of them.

Guvante · on Aug 11, 2023

You always need to compare write vs read performance.

Turning a single table update into a 10 table one could tip your lock contention to the point where you are write bound or worse start hitting retries.

Certainly it makes sense to move rarely updated fields to where they are used makes sense.

Similarly "build your table against your queries not your ideal data model" is always sage advice.

hliyan · on Aug 12, 2023

Ten years ago when DB engines were not as good and servers were not as large, I did something similar -- set up a trigger on insert/updates to certain relations that auto updates the main record with a cache column. Back then it was comma separated, but today I would obviously use JSONB. Back then it reduced latency significantly. Today, I would probably not attempt it.

wtetzner · on Aug 11, 2023

I'd say that probably depends on what your hot path is. If it's write-heavy, then you'll probably end up with performance issues when you need to write the same data to multiple tables in a single transaction. And if all of those columns are indexed, it'll be even worse.

dajonker · on Aug 12, 2023

Very hot take indeed. As with all things, it depends and use the query planner to measure what actually makes a difference.

In our application we have one important join that actually makes things a lot faster than the denormalized alternatieve. The main table has about 8 references to an organization table. To figure out what rows should be selected for a particular organization, you could either query on those 8 columns, making a very big where/or clause. As it turns out, PostgreSQL will usually end up doing a full table scan despite any index you would create.

Instead, there is an auxiliary table with two columns, one for organization and one reference to the main table. Joining on this table simplifies the query and also turns out to be much faster.

killthebuddha · on Aug 12, 2023

IME if a join is the problem, a join is not the problem.

Buttons840 · on Aug 12, 2023

> On this note, I have mixed feelings about Amazon's DynamoDB, but one things about it is to use it properly you need to plan your use first, and schema second. I think there's something you can take from this even with a RDBMS.

This captures the experience I've had with DynamoDB and document databases in general. They appear more flexible at first, but in truth they are much less flexible. You must get them right up front or your going to be paying thousands of dollars every month in AWS bills just for DynamoDB. The need to get things right up front is the opposite of flexibility.

bbojan · on Aug 12, 2023

> - Design your application's hot path to never use joins. Storage is cheap, denormalize everything and update it all in a transaction.

Hard disagree. You just re-implemented a database engine in your application code. Poorly.

anonzzzies · on Aug 12, 2023

As indeed a traditional HN remark, you are giving advice for applications that almost no one will ever need to build because you will never, ever see the type of traffic/users for it.

Also, doing these things ; dejoining, UUIDs and indexing all columns (really unsure about this one; why?), might be better later on, but at the start it will be a lot heavier.

Modern hardware and databases can take an incredible amount of traffic if you use them in the right and natural way without artificial tricks.

skybrian · on Aug 11, 2023

I’m wondering if indexes and materialized views can be used to do basically the same thing? That is, assuming they contain all the columns you want.

giantrobot · on Aug 11, 2023

There's always money in the banana sta...materialized views. Materialized views will get you quite a ways on read heavy workloads.

Izkata · on Aug 12, 2023

As long as you're okay with the reads being a little out of date after writes occur.

latchkey · on Aug 11, 2023

The issue is writes, not reads.

rocqua · on Aug 12, 2023

I wish for a DB that lets me write a completely normalized scheme, and then lets me specify how it should denormalize the scheme for actual storage. There is no reason manual updates to denormalized DBs need to be hand-rolled every time. They are easy to automatically deduce.

kaba0 · on Aug 12, 2023

These are called material views.

tremon · on Aug 12, 2023

*updatable views.

Not every material view is updatable, and a view doesn't need to be materialized to be updatable.

veave · on Aug 11, 2023

>Design your application's hot path to never use joins. Storage is cheap, denormalize everything and update it all in a transaction. It's truly amazing how much faster everything is when you eliminate joins.

Anybody has documentation about this with examples?

joshstrange · on Aug 11, 2023

See "Single Table Design" which I talked about in this comment above: https://news.ycombinator.com/item?id=37093357

deely3 · on Aug 11, 2023

And if you don't want to spend money, you can get basic idea from this article: https://www.alexdebrie.com/posts/dynamodb-single-table/

newlisp · on Aug 11, 2023

Duplicate data to avoid joins, use serializable transactions to update all the duplicated data.

_kidlike · on Aug 12, 2023

> index all columns

check out MariaDB "ColumnStore". it recently got merged into the upstream binary, and i started reading about it. ngl i was salivating a bit.

i_like_apis · on Aug 11, 2023

Yes I like the zero joins on hot paths approach. It can be hard to sell people on it. It’s a great decision for scaling though.

GrumpyNl · on Aug 12, 2023

When your queries become slow because of joins, you havent designed your table structures properly.

zeckalpha · on Aug 12, 2023

Materialized views provide a way to have joins that are accessed like tables!

backendanon · on Aug 12, 2023

Performance isn't king. The business is king.

camgunz · on Aug 12, 2023

I would rather just use a cache.