Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Asymptotic properties of quantile estimators are widely studied [1]. The key is to have a sufficiently large sample size.

[1] Bahadur, R. R. (1966). A note on quantiles in large samples. Annals of Mathematical Statistics, 37, 577–580.



Yet, for any given distribution the sample size can be arbitrarily close to infinite. Unless I've missed something, I don't see the relevance.

If you want the n-9s rate of failure (eg., n=5, 99.999) for a system with a power-law performance distribution, you could be waiting much more than a billion samples to see a failure.

eg., 3E10 ms (30 bn samples) in a year, at 5-9s failure rate has 3E3 ms of failure (3 thousand samples) -- which will be lower given a sampling strategy.


Relevance: from the cited paper, the variance of the median estimator is proportional to 1/(n * f^2), where n is the sample size and f is the density at median.

Two observations: 1. With sufficiently large n, you can control your variance to an acceptable level. 2. The factor f is outside of your control. If your problem has a small density around the median, then you'll need to throw more samples to compensate for it.

I think your concern is about #2: you can always construct a pathological distribution to make the sampling-based approach unattainable. The paper provided guidance on what to expect when you apply the sampling-based approach.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: