I hear good (but expensive) things about Datadog, and Prometheus is _useful_but I would never call it “the peak”.
Configuring it is awful, driving it is awful, the query language is part good, and part broken glass, relabelling is “not actively broken” but it’s far from “sensible, well designed and thoughtful”. Grafana’s whole stack is massively overwrought if you’re self hosting, and rapidly expensive for managed services. The devs often ignore and react aggressively to issues. Improvements to UX or correctness are ignored, denigrated or just outright denied. There’s some really weird design choices around distributed stuff that makes them annoying in my opinion, and there seems to be no intention of ever making that better. Prometheus and worse, Mimir have been some of the most annoying and fragile things I’ve had the displeasure of operating. Prometheus might have been a lot better than what we had before, but I really thing we can do a lot, a lot better than Prometheus, and I see “improved in every way” solutions like Victoria Metrics as direct evidence of that.
i just think that metrics are the right tool for the job when the job is summarizing vast quantities of data.
not when the job is understanding complex systems. in order to do that, you need a ton of context and cardinality, etc. i know so many observability engineering teams that spend an outright majority of their time trying to skate the line between "enough cardinality to understand what's happening" but not so much that it bankrupts them. it's the wrong tool for the job. we need something much more like BI for technical data.
Configuring it is awful, driving it is awful, the query language is part good, and part broken glass, relabelling is “not actively broken” but it’s far from “sensible, well designed and thoughtful”. Grafana’s whole stack is massively overwrought if you’re self hosting, and rapidly expensive for managed services. The devs often ignore and react aggressively to issues. Improvements to UX or correctness are ignored, denigrated or just outright denied. There’s some really weird design choices around distributed stuff that makes them annoying in my opinion, and there seems to be no intention of ever making that better. Prometheus and worse, Mimir have been some of the most annoying and fragile things I’ve had the displeasure of operating. Prometheus might have been a lot better than what we had before, but I really thing we can do a lot, a lot better than Prometheus, and I see “improved in every way” solutions like Victoria Metrics as direct evidence of that.