I don't consider TimescaleDB to be a serious contender as long as I need a 2000 ...

mfreed · on July 21, 2020

Because TimescaleDB is a relational database for general-purpose time-series data, it can be used in a wide variety of settings.

That configuration you cite isn’t for the core TimescaleDB time-series database or internals, but to add a specific configuration and setup optimized for Prometheus data, so that users don’t need to think about it and TimescaleDB can work with the specific Prometheus data model right out of the box.

Those “scripts” automatically setup a deployment that is optimized specifically to have TimescaleDB serve as a remote backend for Prometheus without making users think about anything. Some databases would need to write a lot of specific internal code for this; TimescaleDB can enable this flexibility through configuring proper schemas, indexes, and views instead (in addition to some specific optimizations we've done to support PromQL pushdown optimizations in-database written in Rust).

This setup also isn't something that users think about. Users install our Prometheus stack with a simple helm command.

$ helm install timescale/timescale-observability

The Timescale-Prometheus connector (written in Go) sets this all up automatically and transparently to users. All you are seeing is the result of the system being open-source rather than having various close-source configuration details :)

- https://github.com/timescale/timescale-observability

- https://github.com/timescale/timescale-prometheus

cevian · on July 21, 2020

(TimescaleDB engineer here). This is a really unfair critique.

The project you cite is super-optimized for the Prometheus use-case and data model. TimescaleDB beats InfluxDB on performance even without these optimization. It's also not possible to optimize in this way in most other time-series databases.

These scripts also work hard to give users a UIUX experience that mimicks PromQL in a lot of ways. This isn't necessary for most projects and is a very specialized use-case. That takes up a lot of the 2000 lines you are talking about.

aeyes · on July 21, 2020

> TimescaleDB beats InfluxDB on performance even without these optimization

Would you mind sharing the schema used for this comparison? Maybe I missed it in your documentation of use-cases. When implementing dynamic tags in my own model, my tests showed that your approach is very necessary.

cevian · on July 21, 2020

You can look at what we use in our benchmarking tool https://github.com/timescale/tsbs (results described here https://blog.timescale.com/blog/timescaledb-vs-influxdb-for-...).

Pretty much it's a table with time, value, tags_id. Where the tags table is id, jsonb

R0b0t1 · on July 21, 2020

>The project you cite is super-optimized for the Prometheus use-case and data model.

That may be true, but: Instead of figuring out how to meet the user's needs you're going to say the user is wrong?

NicoJuicy · on July 21, 2020

That's not what he said and is kinda rude to be honest.

What I consider to be said is that they optimize for extendability ( not just one use-case) and that those 2000 lines do a lot of things.

Including replicating optimizations that are are not easy to achieve out of the box in their and other solutions. But they mostly found a way ;)

R0b0t1 · on July 23, 2020

It's about as rude as telling someone their criticism is misplaced in that way. You've added more words to why you believe that, but that doesn't change the original sentiment.

You can say you don't want to support that usecase but someone inquired about whether you would. You said no, they're wrong to compare the two. Sounds a lot like "you're holding it wrong."

gen220 · on July 21, 2020

(Not affiliated with TimescaleDB, just trying to understand your critique)

TimescaleDB is a Postgres extension, which means they have to play PG's rules, which rather implies a complicated mesh of functions, views, and "internal-use-only" tables.

But once they're there, you can pretty much pretend they don't exist (until they break, of course, but this is true for everything in your software stack).

Is your complaint targeting the size of these scripts, or their contents? Because everything in there appears pretty reasonable to me.

aeyes · on July 21, 2020

My problem is what these scripts do. When working with timeseries you want to be able to filter on dimensions/tags/labels.

TimescaleDB doesn't support this out of the box, the reason being Postgres' limitations on multi column indexes when you have data types like jsonb. What they have to do to work around this is to have a mapping table from dimensions/tags/labels which are stored in a jsonb field to integers which they then use to look the data up in the metrics table.

I would expect that a timeseries database is optimized for this type of lookup without hacks involving two tables.

cevian · on July 21, 2020

(TimescaleDB engineer) Actually, the reason we break out the Jsonb like this is because json blobs are big and take up a lot of space and so we normalize it out. This is the same thing that InfluxDB or Prometheus do under the hood. We also do this under the hood for Timescale-Prometheus.

If you have a data model that is slightly more fixed than Prometheus, you'd create a column per tag and have a foreign-key to the labels table. That's a pretty simple schema which would avoid a whole lot of complexity in here. So if you are developing most apps this isn't a problem and filtering on dimensions/tags/labels isn't a problem. Again, this is only an issue of you have very dynamic user-defined tags, as Prometheus does. In which case you can just use Timescale-Prometheus.

gen220 · on July 21, 2020

I'll echo the sibling comment by the TDB engineer, having read and contributed to similar systems myself.

Most of these custom databases (Influx) use techniques similar to TDB's under the hood, in order to persist the information in an read/write optimized way; they just don't "expose" it as clearly to the end-user, because they aren't offering a relational database.

A critical difference between the two, is that custom databases don't benefit from the fine-tuned performance characteristics of Postgres. So even though they're doing the same thing, they'll generally be doing it slower.

klohto · on July 21, 2020

Which kinda confirms the parent’s issues... InfluxDB is really easy to deploy and forget. With TimescaleDB you should be ready to know ins-and-outs of PG to secure and maintain correctly. Sure, for scaling and high loads TDB might be good but InfluxDB is easier and suitable for most loads and maintainability.

gen220 · on July 21, 2020

> With TimescaleDB you should be ready to know ins-and-outs of PG to secure and maintain correctly.

Gotcha, this makes sense. To this, I'd pose the question: is the same not true for Influx? (i.e. With IFDB, you should be ready to know its ins and outs to secure and maintain it correctly). I guess I think about choosing a database like buying a house. I want it to be as good in X years as it is today, maybe better.

From this perspective, PG has been around for a long time, and will continue to be around for a long time. When it comes time to make tweaks to your 5 year old database system, it will be easier to find engineers with PG experience than Influx experience. Not to mention all of the security flaws that have been uncovered and solved by the PG community, that will need to be retrodden by InfluxDB etc.

Anyways, it's just an opinion, and it's good to have a diversity of them. FWIW, I think your perspective is totally valid. It's always interesting to find points where reasonable people differ.

Legogris · on July 21, 2020

Conversely, because of its larger scope and longer history, there are a lot more ins and outs for PG.

mhall119 · on July 21, 2020

Think if InfluxDB is "move-in ready" when buying a house. Sure you won't know the electrical and plumbing as well as you will a "fixer-uppper" in the long run, but you'll have a place to sleep a lot faster and easier.

gen220 · on July 21, 2020

I think it's a good analogy; very reasonable people come to different conclusions on this problem IRL too.

I prefer older houses (at least pre-80s). They have problems that are very predictable and minor. If they had major problems, they would show them quite clearly in 2020.

My friends prefer newer properties, but this is an act of faith, that the builders today have built something that won't have severe structural/electrical/plumbing problems in 20 years.

Of course, if you only plan on living some place for 3-5 years, none of this matters; you want to live where you're comfortable.

kermatt · on July 21, 2020

There may be some extra install steps with Timescale, but that is balanced by having data in a platform accessible by a very well known API.

There are always tradeoffs, but in cases like mine, I find the basis of the platform - PostgreSQL - to provide a very useful set of features.

doublesCs · on July 21, 2020

[flagged]

aeyes · on July 21, 2020

The linked scripts are not part of the base installation yet TimescaleDB claims, for example, Prometheus support.

If you design your own schema and want to filter on such a dynamic tag field you have to replicate what they have implemented in these scripts. If you look at the code you'll see that this isn't exactly easy and other products do this out of the box.

I use TimescaleDB and have suffered from this, otherwise I wouldn't be talking here. Another problem is that these scripts are not compatible with all versions of TimescaleDB so you have to be careful when updating either one.

thibauts · on July 21, 2020

I’m really sad to see this kind of tone more and more in HN comments...

doublesCs · on July 21, 2020

I find it sadder that people talk such glaring nonsense and are taken seriously. Extremely sad.

gen220 · on July 21, 2020

IMO, the HN guidelines are pretty clear and reasonable regarding these situations.

> Be kind. Don't be snarky. Have curious conversation; don't cross-examine. Comments should get more thoughtful and substantive, not less, as a topic gets more divisive.

> Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith.

The original poster was making an assertion that TDB is too complex. As the guidelines suggest, it doesn't help the conversation to assume that they arrived upon this conclusion unreasonably, and that they're irrational or otherwise unwilling to have an open conversation about it.

We have to assume that they are open to a discussion, despite whatever we might interpret as evidence to the contrary.