Hacker News new | past | comments | ask | show | jobs | submit login

> It is very hard to validate a given choice of a prior in many applications. E.g., if I claim one prior, and another investigator claims a sharper one, it can be very difficult to decide who is right.

Both prior and likelihood are our model's assumptions. So, the prior validation problem is similar to the likelihood validation problem. To check a Bayesian model or any model, we need to bring the model out of the formal world, to the real world for validation.

Prior predictive simulation method, which generates random data points from the prior, is a good heuristic to check if the prior is NOT plausible.




> ...the prior validation problem is similar to the likelihood validation problem...

But priors can be much harder.

Say I’m trying to estimate a wind speed from the blade velocity of a windmill. I can bring a more accurate wind speed sensor to calibrate the windmill against the wind speed, perhaps aided by basic physics. This is the likelihood portion.

But what should the prior be? The typical speed at that time of day? The speed in January? The speed on cloudy days? I have to have a crisp number — a full distribution actually, accurate out to the tails. I really have very little grounding for choosing that distribution.

I started out just wanting to relate the wind speed to some data in a rather concrete way, and now I’ve been roped in to choosing a crisp distribution for a rather amorphous state of nature.

This is a deep problem.

We can sharpen the problem. Say my number and yours are different. How do we tell who is right?

One can try a different tack: I’m being stubborn. The prior will mostly wash out in any well-posed problem, or else why try to solve it? But now we’re back to frequentism, just looking at the likelihood.

HN tends to invoke the Bayesian framework as a complete solution to inference — I’m just trying to demonstrate that there are problems with that approach.


> I can bring a more accurate wind speed sensor to calibrate the windmill against the wind [...] But what should the prior be? [...] I have to have a crisp number — a full distribution actually, accurate out to the tails.

What would you do when the sensor returns negative wind speeds due to noise or errors?

The wind speed cannot be negative, or greater than speed of light. An expert in windmill can narrow down the prior distribution much more.

> But priors can be much harder.

Choosing a prior is hard because it requires thinking explicitly about the problem and its assumptions. It merely exposes our lack of expertise on the problem.

When you're lazy, you can pick a uniform prior Uniform(0, c) and call it a day.

> We can sharpen the problem. Say my number and yours are different. How do we tell who is right?

Forget about prior, say, we have 2 sensors which output two slightly different wind speeds. Which wind speeds is right? The lower one or the average speed.

This is a deep philosophical problem. However, it's a problem for any model.

> The prior will mostly wash out in any well-posed problem.

I don't think so. Any well-posed problem should include the prior, or else how can we tell: 2 data points is not enough?

> HN tends to invoke the Bayesian framework as a complete solution to inference [...]

Bayesian framework is indeed a complete solution to inference in a formal/logical sense. However, I agree that there are many problems in applying Bayesian framework to real world problems that requires serious thinking about our assumptions on the problem.


"Bayesian framework is indeed a complete solution to inference in a formal/logical sense."

Bradley Efron, in TFA, begs to disagree:

"I wish I could report that this resolves the 250-year controversy and that it is now safe to always employ Bayes’ theorem. Sorry. My own practice is to use Bayesian analysis in the presence of genuine prior information; to use empirical Bayes methods in the parallel cases situation; and otherwise to be cautious when invoking uninformative priors. In the last case, Bayesian calculations cannot be uncritically accepted and should be checked by other methods, which usually means frequentistically."


xcodevn said “in a formal/logical sense”, not “in a practical sense”.


My perspective is that the problem of deciding the "correct" prior is a human problem because the human brain is a messy machine. An artificial intelligence which has full access to its own code and its memory in perfect detail will know precisely what it knows about a certain situation, and therefore can estimate a prior that accurately reflects this knowledge.

In the windmill example, the AI can quickly collect all it has in its memory about blade speeds, and maybe spend a self-imposed X min computational time to make a best guess for the prior speed distribution.

Humans can't do this, so we have gone down a philosophical rabbit hole of figuring out this "prior problem", when the real problem is that we are just messy informal thinkers.

> How do we tell who is right?

You are fundamentally conceptually mistaken here. There is nothing right or wrong with two agents disagreeing on the prior. The different prior reflects the before experiment knowledge of the two agents. I am a windmill engineer, so my priors will be much more narrow than yours, who has never seen a windmill outside of a hollywood movie.


To me, and I'm not quite the expert that perhaps you are, but to me it seems like Bayesian inference is still in a better spot here because the priors are part of an explicit quantification of bias and assumption in a model.

Much havoc has befallen the scientific world because of the hidden assumptions of frequentist techniques with poorly understood preconditions, even for rather basic models. And there isn't much anyone can do about that save move to ever more complicated models.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: