karakanb's favorites | Hacker News

Thank you!

1) All data is partitioned based on the "instanceId" of events (see `instanceId` here: https://docs.trench.dev/api-reference/events-create). Instance IDs are typically a logically meaningful way of separating users (such as by company/team/etc.) that allows for sharding the data across nodes.

2) Yes, this the number 1 thing on our roadmap right now (if anyone is interested in helping build this, please reach out!)

3) We're using the Kafka engine in ClickHouse for throttling the ingestion of events. It's partitioned by instanceId (see #1) for scaling/fast queries over similar events.

4) My benchmarks in production showed a single EC2 instance (16 cores / 32 gb ram) barely working at 1000+ inserts / second with roughly the same amount of queries per second. Load averages 0.91, 0.89 0.9. This was in stark contrast to our AWS Postgres cluster which continued to hit 90%+ CPU and low memory with 80 ACUs, before we finished the migration to Trench.

5) We seemed to solve this by running individual Node processes on every core (16 in parallel). Was the limit you saw caused by ClickHouse's inbound HTTP interface?

6) Right now the system uses just a default MergeTree ordered by instanceId, useId, timestamp. This works really well for doing queries across the same user or instance, especially when generating timeseries graphs.

7) I am still trying to figure out the best Kafka partitioning scheme. userId seems to be the best for avoiding hot partitions. Curious if you have any experience with this?

Let us know how the migration goes and feel free to connect with me (christian@trench.dev).