YMMV, but I'd rather use an elementary and mathematically sound statistical technique than macguyvering something myself. (Though I do understand that, as some people describe in this thread, there can be different purposes to ratings and hence a need for different sorting mechanisms.)
You're right that confidence intervals depend on both quality and quantity, but the reason for this is to account for uncertainty. As n goes up, the standard error goes down to practically 0, and so e.g. a 4.2/5 movie with 100 reviews is still likely to be sorted higher than a 4.1 movie with 200 reviews. Quantity only comes into play when there is very little information to go on, after that quality becomes the driving factor.
Using internet points this way is not mathematically sound.
You're using a highly biased sample, to begin with.
The mathematics here start with the assumption that you have a random sample. You don't. The assumption is invalid; and there's no reason to believe this calculation is a good one compared with any other.
A particular type of person votes on particular things: it's commonly observed that new movies rate higher than they should on IMDB, because super-fans are the first to vote. (This is convenient for your example)
Inappropriate for obtaining unbiased confidence intervals, yes, but this doesn't matter for ranking if bias is uniform across everything that's being rated.
You make a good point that there might be differential bias depending on when the movie came out, but I don't think the solution is then to say "well, now all bets are off, might as well concoct our own techniques and assume they're just as good or better." Statistical techniques are not either fully valid or fully invalid. Simulate the bias and look at exactly how it influences the results of a particular technique, and then perhaps use that to make an adjustment based on data rather than intuition.
It's not about macguyvering something, it's about reflecting on how well different solutions stack up with what you actually want to do. There is no one-size-fits-all ranking formula, at least in my opinion.
You're right that confidence intervals depend on both quality and quantity, but the reason for this is to account for uncertainty. As n goes up, the standard error goes down to practically 0, and so e.g. a 4.2/5 movie with 100 reviews is still likely to be sorted higher than a 4.1 movie with 200 reviews. Quantity only comes into play when there is very little information to go on, after that quality becomes the driving factor.