Asymptotic properties of quantile estimators are widely studied [1]. The key is ...

mjburgess · on July 26, 2024

Yet, for any given distribution the sample size can be arbitrarily close to infinite. Unless I've missed something, I don't see the relevance.

If you want the n-9s rate of failure (eg., n=5, 99.999) for a system with a power-law performance distribution, you could be waiting much more than a billion samples to see a failure.

eg., 3E10 ms (30 bn samples) in a year, at 5-9s failure rate has 3E3 ms of failure (3 thousand samples) -- which will be lower given a sampling strategy.

rhymer · on July 27, 2024

Relevance: from the cited paper, the variance of the median estimator is proportional to 1/(n * f^2), where n is the sample size and f is the density at median.

Two observations: 1. With sufficiently large n, you can control your variance to an acceptable level. 2. The factor f is outside of your control. If your problem has a small density around the median, then you'll need to throw more samples to compensate for it.

I think your concern is about #2: you can always construct a pathological distribution to make the sampling-based approach unattainable. The paper provided guidance on what to expect when you apply the sampling-based approach.