Anytime I hear "we need to blast in per-second measurements of ..." my mind jump...

speedgoose · on April 6, 2024

ClickHouse is one of the few databases that can handle most of the time-series use cases.

InfluxDB, the most popular time-series database, is optimised for a very specific kind of workloads: many sensors publishing frequently to a single node, and frequent queries that are not going far back in time. It's great for that. But it doesn't support doing slightly advanced queries such an average over two sensors. It also doesn't scale and is pretty slow to query far back in time due to its architecture.

TimeScaleDB is a bit more advanced, because it's built on top of PostGreSQL, but it's not very fast. It's better than vanilla PostGreSQL for time-series.

The TSM Bench paper has interesting figures, but in short ClickHouse wins and manage well in almost all benchmarks.

https://dl.acm.org/doi/abs/10.14778/3611479.3611532

https://imgur.com/a/QmWlxz9

Unfortunately, the paper didn't benchmark DuckDB, Apache IoTDB, and VictoriaMetrics. They also didn't benchmark proprietary databases such as Vertica or BigQuery.

If you deal with time-series data, ClickHouse is likely going to perform very well.

lispisok · on April 6, 2024

I work on a project that ingests sensor measurements from the field and in our testing found timescaledb was by far the best choice. The performance x all their timeseries specific features like continuous aggregates and `time_bucket` plus access to the postgres ecosystem was killer for us. We get about 90% reduction in storage with compression without much performance hit too

seattleeng · on April 6, 2024

Did you try clickhouse? What were its weak points?

tucnak · on April 7, 2024

No real SQL, no real materialisation engine, no extensions.

Too · on April 6, 2024

Apache Parquet as data format on disk seems to be popular these days for similar DIY log/time series applications. It can be appended locally and flushed to S3 for persistence.

robertlagrant · on April 6, 2024

> but "we write to EBS, what's the worst that can happen" strikes me as ... be sure you're comfortable with the tradeoffs you've made in order to get a catchy blog post title

In what way?

freeone3000 · on April 6, 2024

EBS latency is all over the place. The jitter is up to the 100ms scale, even on subsequent IOPS. We’ve also had intermittent failures for fsync(), which is a case that should be handled but is exceptionally rare for traditionally-attached drives.

RHSeeger · on April 6, 2024

The author does note in the writeup that they are comfortable with some (relatively rare) data loss; like server failure and the like. Given their use cases, it seems like the jitter/loss of EBS wouldn't be too impactful to them.

solatic · on April 6, 2024

There's different kinds of data loss. There's data loss because you lose the whole drive; because you lost a whole write; because a write was only partially written. Half the problem with NIH solutions is, what happens when you try to read from your bespoke binary format, and the result is corrupted in some way? So much of the value of battle-tested, multi-decade-old databases is that those are solved problems that you, the engineer building on top of the database, do not need to worry about.

Of course data loss is alright when you're talking about a few records within a billion. It is categorically unacceptable when AWS loses your drive, you try to restore from backup, the application crashes when trying to use the restored backup because of "corruption", the executives are pissed because downtime is reaching into the hours/days while you frantically try to FedEx a laptop to the one engineer who knows your bespoke binary format and can maybe heal the backup by hand except he's on vacation and didn't bring his laptop with him.

pas · on April 7, 2024

> Half the problem with NIH solutions is, what happens when you try to read from your bespoke binary format, and the result is corrupted in some way?

restoring an EBS snapshot seems pretty similar to restoring Aurora/RDS, binary format or not. if you know you have problems. (they don't mention checksums in the blog post. or any kind of error handling, just that they can buffer some writes.)

usually the problem with NIH solutions is that evolving/extending them is hard. (of course multi-decade projects are also not magically free of ossified parts, we just euphemistically think of them as trade-offs.)

kikimora · on April 8, 2024

How EBS snapshots going to be consistent? I mean AWS takes them at random time, a half of a write may be captured. Another, less common scenario is silent data corruption you never notice until you need to restore.

pas · on April 9, 2024

I assume they do the snapshotting from some script. (Stop writes, flush, snapshot, restart writes.) If not, then they probably have some way of detecting partial writes. It seems that they do fsync() every second.

> Another, less common scenario is silent data corruption you never notice until you need to restore.

I tried to find what kind of guarantees EBS offers, but they only talk about durability and availability ( https://aws.amazon.com/ebs/sla/ ), nothing about data corruption, individual sector failure. (So this could mean that they either won't even notice ... or you might get back a full zero page - assuming they detect some underlying error, but due to encryption at rest the internal checksum fails, assuming there is one.)

It's nice that AWS recommends you to just ignore read errors ... https://repost.aws/questions/QUGqXF7BcOQKCopHJcnnDrRA/ebs-vo... :D :D :D

Spivak · on April 6, 2024

I mean if you spun up Postgres on EC2 you would be directly writing to EBS so that's not really the part I'm worried about. I'm more worried about the lack of replication, seemingly no way to scale reads or writes, beyond a single server, and no way to failover uninterrupted.

I'm guessing it doesn't matter for their use-case which is a good thing. When you realize you only need like this teeny subset of db features and none of the hard parts writing you own starts to get feasible.

drdaeman · on April 7, 2024

Replication and reads can be scaled with something like Patroni or even a DIY replication setup (if one knows what they’re doing, of course), but writes are difficult.

VirusNewbie · on April 6, 2024

Right, cassandra/scylla model is really good for time series use cases, i’ve yet to see good arguments against them.

jpgvm · on April 7, 2024

It's generally good for append-only workloads.

Where C* databases seems to fall down are point updates and in this case, requirement to implement your own aggregations.

For these workloads you are much better off (unless you are already running C* somewhere and are super familiar with it) with something like Clickhouse or if you need good slice and dice then Druid or Pinot.

VirusNewbie · on April 8, 2024

Yeah, point updates are less than stellar, but for aggregation, it's fine unless you're really want adhoc low latency queries. For anything substantial you'd want Spark/Beam for aggregation otherwise, and the Cassandra model is really fast at loading up time series type data in parallel, and Spark etc. make it really easy to do. The tradeoff is just the high startup cost of those kinds of jobs.