Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hmm, this article is a little confusing. I'm not familiar with Vitess or Citus, but am familiar with "manually" sharded Postgres/Mysql, and I'm not sure I understand if there's any "interaction effects" of the decision to shard or not and the decision between MySQL/Postgres and Sqlite.

Like, the article's three sections are:

1. The challenges of sharding

2. The benefits of these new sharded Sqlite solutions over conventional Sqlite

3. A list conflating the benefits of SQL databases generally with the benefits of Sqlite

None of which answer the question of "Why should I use sharded Sqlite instead of, say, sharded Postgres, for hyperscale?".



Author here.

Agreed — I think adding some comparisons to other database partitioning strategies would be helpful.

My 2 cents, specifically about manually partitioning Postgres/MySQL (rather than using something like Citus or Vitess):

SQLite-on-the-server works similarly to Cassandra/DynamoDB in how it partitions data. The number of partitions is decoupled from the number of databases you're running, since data is automatically rebalanced for you. If you're curious, Dagster has a good post on data rebalancing: https://dagster.io/glossary/data-rebalancing.

With manual partitioning, compared to automatic partitioning, you end up writing a lot of extra complex logic for:

- Determining which database each piece of data lives on (as opposed to using partitioning keys which do that automatically)

- Manually rebalancing data, which is often difficult and error-prone

- Adding partitions manually as the system grows

- (Anecdotally) Higher operational costs, since matching node count to workload is tricky

Manual partitioning can work fine for companies like Notion, where teams are already invested in Postgres and its tooling. But overall, I think it introduces more long-term problems than using a more naturally partitioned system.

To be clear: OLTP databases are great — you don’t always need to reach for Cassandra, DynamoDB, or SQLite-on-the-server depending on your workload. But I do think SQLite-on-the-server offers a really compelling blend of the developer experience of Postgres with the scalability of Cassandra.


> - Determining which database each piece of data lives on (as opposed to using partitioning keys which do that automatically)

Most sharding databases use consistent hashing.

> - Manually rebalancing data, which is often difficult and error-prone

not really. When you setup the database, you choose a highly divisible number of shards and then consistent hashing to spread the data across the shards. Each server hosts N-shards where N changes as your data (and server count) grows

> - Adding partitions manually as the system grows

Not really. Just choose a reasonably high number of shards and divide them across your servers.

> - (Anecdotally) Higher operational costs, since matching node count to workload is tricky

This could be true, but also, there is overhead to managing a ton of SQLite databases too. I think there are tradeoffs here.


If you think this is a good fit for your case, you should embed SQLite in your application and shard your application. An embedded SQLite is faster and uses less memory than a PostgreSQL running as a separate process and possibly on a different machine.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: