That looks like a typical collider bias to me...
There should be no correlation between location and quality... But as you are looking at restaurant that are still "in business" you are introducing a bias.
If you simplify, a restaurant can have :
- good/bad location
- good/bad food
If your restaurant has bad location and bad food, it is not going to stay in business very long.
After that you can have a mix of all, but if you remove the "bad/bad" restaurant there is a correlation that appears, but it is due to the collider bias.
> if you remove the "bad/bad" restaurant there is a correlation that appears, but it is due to the collider bias.
This sounds like the correlation appears because of you throwing away some data, but the way I see it, that correlation is real - you're not removing the bad/bad restaurants, the market is.
I've been reading up on collider bias on Wiki and pondering the examples[0] - restaurants, dating, celebrities - and the way I see it, the biased statistics is still true for whoever is doing the classification (person visiting fast-food restaurants, or looking for a date), and if their selection (taste) generalizes, it might also carry over to the general population.
I feel the restaurant example from Wiki, with its associated image below, is worth discussing:
"An illustration of Berkson's Paradox. The top graph represents the actual distribution, in which a positive correlation between quality of burgers and fries is observed. However, an individual who does not eat at any location where both are bad observes only the distribution on the bottom graph, which appears to show a negative correlation."
This feels wrong to me. Why is the regression line nearly horizontal when, eyeballing the graph, a nearly vertical one would fit better and capture an even stronger positive correlation between qualities of hamburgers and fries? In fact, I'm tempted to even throw away the leftmost and rightmost points on the lower panel as outliers.
Anyway, this example assumes the bad/bad restaurants are not visited by the subject - however, if we take your scenario where bad/bad restaurants quickly go out of business, then it's the market that creates the correlation between those two hypothetically independent qualities, so as long as we're talking real world and not some imaginary spherical restaurants in frictionless vacuum, it would be fair to say the correlation exists (and that the causal mechanism behind it is market selection).
If your restaurant has bad location and bad food, it is not going to stay in business very long.
After that you can have a mix of all, but if you remove the "bad/bad" restaurant there is a correlation that appears, but it is due to the collider bias.