Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thank you. As a statistician, the fact that mixed effects models (e.g. does this rater tend to rate high?) are overlooked is, IMHO, a death sentence. Too much nomenclature, too early (link to the table within the text, please, and omit needless words), and with too little attention paid to the value of an external citation.

Also, MCMC for ratings? Surely you jest. If the author had touched on mixed models, then maybe it would make sense. But given the sample sizes involved here, and the noise in the variance estimates, I recommend that the author investigate mixed models tout suite if they do in fact care about the sources of shared and unshared effects on variance. Because that is what mixed models do.



Author here. Please see the section on mixed models in my post. As I mentioned there, I would love if an expert could expand on the relationship between mixed effects and Empirical Bayes.

Regarding MCMC, one of the things I try to emphasize throughout the post is that the best solution depends on your needs (for example if you want a full posterior). In fact, most of the post is devoted to quick and simple methods -- not MCMC -- because they are good enough for most purposes. I welcome your feedback though on how I could make this point clearer.


> Author here.

Alright, I'll put on my Reviewer Number 3 hat and say that I learned some neat things from your work, including that the National Swine Improvement Federation. I'll try and do a halfway decent job here.

> I would love if an expert could expand on the relationship between mixed effects and Empirical Bayes.

A real expert? Here you go:

http://statweb.stanford.edu/~ckirby/brad/LSI/monograph_CUP.p...

Read it, all of it, but particularly chapter 1, section 2.5, and chapters 8, 10, and 11. Why does testing, effect size estimation, and high-dimensional analysis have anything to do with anything? Because...

1) independence is largely a myth 2) you are likely to have multiple ratings per reviewer on your site, whether your generating distribution is nearly-continuous (0-10, mean-centered) or discrete (0/1, A/B/C). If you discard this, you are throwing away an enormous amount of information, and failing utterly to understand why a person would estimate not just the variance but the covariance even for a univariate response.

The second point is the one that matters.

Also, "empirical Bayes" is in modern parlance equivalent to "Bayes". What's the alternative? "Conjectural Bayes"? (Maybe I should quit while I'm ahead, pure frequentists may be lurking somewhere)

> I welcome your feedback though on how I could make this point clearer.

For starters, edit. Your post is too damned long.

Think about where you are getting diminishing returns and why. Is there ever a realistic situation where your ratings site would not keep track of who submitted the rating? (It's certainly not going to be an unbiased sample, if so; the ballot box will get stuffed) So if you have to keep track of who's voting, you automatically have information to decompose the covariance matrix, and everything else logically follows.

A univariate response with a multivariate predictor (say, rating ~ movie*rater) can have multiple sources of variance, and estimating these from small samples is hard. When you use a James-Stein estimator, you trade variance for bias. You're shrinking towards movie-specific variance estimates, but you almost certainly have enough information to shrink towards movie-centric and rater-centric estimates of fixed and random effects, tempered by the number of ratings per movie and the number of ratings per rater. (Obviously you should not have more than one rating per movie per rater, else your sample cannot be unbiased).

I think you will return to this and write a much crisper, more concise, and more useful summary once this sinks in. I could be wrong. But you'll have learned something deeply useful even if I am. I do not think you can lose by it.


> Also, "empirical Bayes" is in modern parlance equivalent to "Bayes". What's the alternative? "Conjectural Bayes"?

My understanding of the difference, as a frequent user of empirical Bayes methods (mainly limma[1]), is that in "empirical Bayes" the prior is derived empirically from the data itself, so that it's not really a "prior" in the strictest sense of being specified a priori. I don't know whether this is enough of a difference in practice to warrant a different name, but my guess is that whoever coined the term did so to head off criticisms to the effect of "this isn't really Bayesian".

[1]: https://bioconductor.org/packages/release/bioc/html/limma.ht...


Do you have a webpage? I just helped my wife (physician) with stats for a research presentation she made that sought to track infection spreading in hospitals (via room number, location specific) via movement of equipment and staff which was tagged. They then PCRd the strains to make sure it was the same one.

The experimental design was good, the stats person they had to help them decipher the results was.... Left much to be desired.

Can you please be so kind to email me jpolak{at} email service of a company where a guy named Kalashnikov worked.


Yup, I agree about throwing away rater information. The actual application at my company that motivated me to research this doesn't have rater information, which is why I didn't think to adjust for it. The movie case was just an example I used to motivate this post for which, yes, I agree, rater information would be quite useful.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: