Histograms require you to configure buckets into which your samples are allocated; to allocate the buckets appropriately, you need to know what your expected values are — that is, to measure latency, you need to know your latency. While this can work (I think most of us have a clear idea, or can obtain an idea of what our typical latencies is, and configure buckets around that) it is inelegant. I feel like I would rather have X=percentile, Y=latency, but such a bucketing gives you X=latency, Y=request count. Still useful, but only as informative as you are good at choosing buckets. (There is the histogram_quantile function, but I am unclear that its assumption of linear distribution within buckets really makes much sense, since most things would be long-tail distributions, and thus I would think that once you get past the main "hump" of typical latencies, most samples would cluster towards the lower end of any particular bucket.)
I am not clear on how Summaries actually work; they appear to report count and sum of the thing they're monitoring; that is, if one were to use them for latencies (and the docs do indeed suggest this), it would report a value like "3" and "2000ms", indicating that 3 requests took a total of 2000ms together; how is one supposed to derive a latency histogram/profile from that?
Prometheus's fatal flaw here, IMO, is that it requires sampling of metrics. That is, things like CPU, which are essentially a continuous function that you're sampling over time. But its collection method/format doesn't seem to really work that well for when you have an event-based metric, such as request latency, which only happens at discrete points. (If no requests are being served, what is the latency? It makes no sense to ask, unlike CPU usage or RAM usage.)
To me, ideally, you want to collect up all the samples in a central location and then compute percentiles. Anything else seems to run afoul of the very "doing percentiles on the agents, then 'averaging' percentiles at the monitoring system" critique pointed out in the video posted in this sibling comment: https://news.ycombinator.com/item?id=18194507
Your points are largely valid, but prometheus is a monitoring solution, not a scientific or financial tool.
Certain tradeoffs are taken since the monitoring aspect comes first and being scientifically correct comes second.
Hence poll vs push, for instance.
I am not clear on how Summaries actually work; they appear to report count and sum of the thing they're monitoring; that is, if one were to use them for latencies (and the docs do indeed suggest this), it would report a value like "3" and "2000ms", indicating that 3 requests took a total of 2000ms together; how is one supposed to derive a latency histogram/profile from that?
Prometheus's fatal flaw here, IMO, is that it requires sampling of metrics. That is, things like CPU, which are essentially a continuous function that you're sampling over time. But its collection method/format doesn't seem to really work that well for when you have an event-based metric, such as request latency, which only happens at discrete points. (If no requests are being served, what is the latency? It makes no sense to ask, unlike CPU usage or RAM usage.)
To me, ideally, you want to collect up all the samples in a central location and then compute percentiles. Anything else seems to run afoul of the very "doing percentiles on the agents, then 'averaging' percentiles at the monitoring system" critique pointed out in the video posted in this sibling comment: https://news.ycombinator.com/item?id=18194507