I guess this is an unpopular opinion, but I’ve found InfluxDB to be superb for being trivial to get going in a high performance way. I have never touched InfluxDB Cloud - always just InfluxDB either as an arbitrary process or container. Examples of where I’ve found InfluxDB to be more pleasant:
* InfluxDB has way better documentation on functions. For example, look up moving average by time (not points) on TimescaleDB vs InfluxDB. We use these more complex queries and have no problem on Influx. Going further, the number of functions built in is impressive with the same ability to define new functions.
* InfluxDB containers are totally self contained which is great for simple architectures. As a process, InfluxDB is a single executable thanks to Go.
* This is extremely subjective, but I find Flux easier to comprehend as a separate query vs. the use of SQL to do higher complexity functions; however, I am sure this is due to my lack of experience and know how to write said queries in SQL.
The benchmarks are interesting, showing TimescaleDB to be the clear winner in most scenarios.
For me that's nice, but it's a bigger deal to me personally that I already have Postgres and SQL experience that translates directly to TimescaleDB, I don't have to learn a new tool and query language. Development is complex enough and I have to learn too many things as it is. The older I get the less enthusiastic I am about adding something new to the stack.
Agree totally on the "double down on what you know" point. That pays off in spades usually.
Tangentially related to that: their mongo benchmark numbers always looked odd to me. Given that I've used mongo for 10+ years for high throughput time series data without major issues, I decided to do my own benchmarks. In my testing, mongo outperformed timescale significantly both in write throughput and query performance.
This is likely in part due to the fact that I'm using well-understood internal data from real production systems, and as such my ability to be able to build performant indexes / query strategies in the database that I know best introduces a performance bias.
I always take benchmarks with a grain of salt, for this reason. And I try to lean into the tech I understand best.
Hi @spmurrayzzz thanks for the feedback. (Timescale person)
Always strive to do the best and fairest benchmarks we can, and for that reason, all our benchmarks are fully open-source for both repeatability and improvements/contributions:
We also really did spend a lot of time investigating approaches with MongoDB, so you'll see our benchmarks actually evaluate two _different_ ways to use time-series data with MongoDB (culled & optimized from suggestions in MongoDB forums). But always welcome to feedback:
Thanks for engaging here, and congrats on the round!
I've reviewed all these resources multiple times in the past, which is what prompted me to do my own benchmarks (in which mongo outperforms both multinode and single node configurations).
Some issues I noticed:
- youre using gopkg.in/mgo.v2 which is a mongo driver that hasn't had a release in 6 years. Not sure of the general performance impact here, but my tests use mongo 4.2 with a modern node.js driver. So thats one difference.
- your indexing strategy for mongo is easily changed to be able to get much better performance than the naive compound approach you took in the code (measurement > tags.hostname > timestamp).
- you didnt test the horizontal scaling path at all, this is where mongo arguably shines
I'm glad you all open source this stuff because it helps engineering leaders make better decisions, so thank you for that. But your data does not align with my own: either our production metrics or through structured load testing.
I also recall that when we [Timescale] first did our benchmarks vs Mongo for time-series, our use of MongoDB for time-series beat Mongo's own benchmarks :-)
That's probably not something most companies would do for benchmarking, but we take ours seriously :-)
I'm currently using InfluxDB (v1 not v2) and I've looked into switching over to Timescale DB.
Currently I'm stuck on figuring out how to get data into TimescaleDB. My company makes heavy use of Telegraf, which is a natural fit for InfluxDB, but not so much for TimescaleDB. The original pull request for the Telegraf plugin for Postgres/TimescaleDB was closed because the author was non-responsive: https://github.com/influxdata/telegraf/pull/3428
I can even write data to it using simple TCP or UDP tools like netcat or curl. And for some cases I have simple scripts that do exactly that. TimescaleDB, on the other hand, requires some sort of Postgres client.
What do you, or other people, use for writing data into TimescaleDB?
One of our active community members took over the effort to merge PostgreSQL/TimescaleDB support into telegraf here, so hopefully that can make progress:
Yeah, I saw that. I guess I'm a just a little disappointed that nobody at the TimescaleDB team saw through the process. But I understand if you have higher priorities.
I still wonder what other people are using to feed information into TimescaleDB. I'm wondering if I should switch to a different approach, such as using Telegraf but routing the data to something else that will push data into TimescaleDB.
Don't want this to come across as overly defensive, but was under PR review for 3+ years by Influx with little progress (first submitted in November 2017) and during that period I think we did something like 2 significant rewrites. Became a bit of a moving target against telegraf that became harder to prioritize.
Fully agreed on having that SQL experience guiding you on a totally reasonable solution.
However; our problem space is not high cardinality data; it more closely aligns to the first performance comparison with 10 devices and 10 metrics. The ease of getting high performance with pre implemented functions is great for us. Reliability is obviously a concern, and I can agree that if data is sacred, then choosing something built on Postgres is going to be a better thought.
Again, this is just our problem space; small scale deployments on many machines with no preexisting RDMS, low cardinality data, etc. I think it’d be a different story if we were huge, but for us, InfluxDB provides some seriously handy feature and is worth consideration if your problem is similar.
We totally hear you that usability and the developer experience is super important, especially when starting out.
One project we launched earlier this year "Timescale Analytics" actually seeks to address exactly this, e.g., bring more useful features and easier programmability to SQL [1] and you can see (or add) to the discussion on github [2].
Also informed by some of the super helpful functions we've seen in PromQL. And by the way, if you are interested in PromQL, we have 100% compatibility with PromQL through Promscale [3], which provides an observability platform for Prometheus data (built on TimescaleDB).
Postgres as a base is battle-tested, extremely reliable, and well understood.
Most developers are already familiar with Postgres or at least SQL.
The tooling around Postgres is basically universal.
There's huge value in an option that is literally just "install this Postgres extension and everything works and gets out of your way".
We use TimescaleDB for a handful of products. In several cases we literally just updated a DSN to point a product at TimescaleDB instead of an existing database and the project Just Worked(TM) except hundreds of times faster.
And some of those that we developed on TimescaleDB natively, it was more or less the same thing... give a team TimescaleDB and they're basically productive immediately. There's no learning and integrating new libraries and query languages, no time from ops finding new and exciting problems to solve in hosting and scaling the DB, etc.
We get all this with all the functionality and strong guarantees that Postgres provides.
You will find a lot of people (myself included) who've bet on InfluxDB and sorely regretted it afterwards. It's not even remotely close to Postgres and Timescale in reliability, and to be honest hardly production ready if you work with critical data.
Funny, I started using InfluxDB in several projects, but threw it out immediately when TimescaleDB appeared because I thought TSDB seemed a lot more solid and well-designed. Influx seemed much more of a quick hack in comparison - I did not like the (Python?) SDK or the docs, and operational-wise it felt a little flaky compared to TimescaleDB. A disclaimer is that I'm very familiar w PostgreSQL since many years back, so TSDB operations felt very intuitive to me while Influx was all new stuff - that probably made a difference.
Example: your temperature sensor is faulty and produces values like -100. You can't delete this data by using "delete from measurement where temperature < -50". You have to get all timestamps, then delete those timestamps one by one.
* InfluxDB has way better documentation on functions. For example, look up moving average by time (not points) on TimescaleDB vs InfluxDB. We use these more complex queries and have no problem on Influx. Going further, the number of functions built in is impressive with the same ability to define new functions.
* InfluxDB containers are totally self contained which is great for simple architectures. As a process, InfluxDB is a single executable thanks to Go.
* This is extremely subjective, but I find Flux easier to comprehend as a separate query vs. the use of SQL to do higher complexity functions; however, I am sure this is due to my lack of experience and know how to write said queries in SQL.