Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> When we randomise people, these influences will still be operating on the outcome, which will vary across the people randomised to our conditions. Does randomisation mean that all these different effects are balanced somehow? No – not least because confounders do not exist in experimental studies! This is for the simple reason that a confounder is something that affects both the exposure and the outcome, and in an experimental (i.e., randomised) study we test for a difference in our outcome between the two randomised groups.

I do not think this sort of word-play is useful. If your random samples are small (and even if statistically adequate) the chance of confounders in one or both groups of an A/B can be relatively high, even though the selection procedure for treatment is random. So, "No – not least because confounders do not exist in experimental studies!" is misleading if it is expected on the basis that randomisation of treatment allocation somehow makes confounding impossible.

That the possibility of confounding is equally likely in both branches remains true for all sample sizes of an A/B where #A=#B and allocation is random. So, in my opinion, not a myth.



The author is using the technical definitions of confounders and covariates without sufficient explanation, and the technical definitions do not match the normal English definitions.

In English, a confounder is any factor that distorts an observation. (My dictionary defines it as throwing into confusion or disarray.)

In causal inference, a confounder is a factor that is correlated with both treatment and outcome. If the treatment is randomly assigned, by construction it is independent of all other factors. This, there can be no confounders.

Your example is about observed occurrences of imbalance, but the technical definition is about probabilities. Observed imbalances can still skew inference, but that causes high variance (or low precision). It doesn't cause bias (or affect accuracy).

Adjusting for observed imbalances can reduce variance, but in some circumstances can actually cause bias.


Picking a specific definition of a word, expounding its consequences, and then referring to the colloquial usage of the word as a “myth” is just word-play as I said.

Many do think of confounders in an experimental context as just those effects which correlate with both outcome and treatment. The non-sequitur — barring a specific definition — is concluding that since nothing can correlate with random allocation, confounders are impossible by construction.

Why impossible? Because we are talking about the probability of allocation, not the actual allocation, and confounding does not refer to the result. We’d instead say there are imbalanced covariates, but that’s ok because randomisation converts “imbalance into error”. Yet, the covariates may be unknown, and without taking measurements prior to the treatment, how are we supposed to know whether the treatment itself or just membership of the treatment group explains the group differences?

Had we not tested the samples prior to treatment, the result would be what many would call “confounded” by the differences in the samples prior to treatment.

From https://en.wikipedia.org/wiki/Confounding#Decreasing_the_pot..., please note the use of the word:

The best available defense against the possibility of spurious results due to confounding is often to dispense with efforts at stratification and instead conduct a randomized study of a sufficiently large sample taken as a whole, such that all potential confounding variables (known and unknown) will be distributed by chance across all study groups and hence will be uncorrelated with the binary variable for inclusion/exclusion in any group.


Sure, the chance of (unbalanced) “confounders” can be high in a small sample. But the statistical machinery you’re using is designed to handle that. If you try to avoid it then you’re violating the underlying assumptions of that statistical machinery, no?


It doesn't work that way, and it is common to test for known imbalances, or to implement stratified sampling. As another poster said, confounders do not have to be conveniently distributed, even though that is typically assumed. It could be that a big effect is present in a confounder but collected only every N samples because it is sparse. In which case you could have large, randomly allocated but confounded samples.

All these things can and do happen in randomised experiments, but it is still orders of magnitude more interpretable than what can happen in observational studies.


Sure, if you correctly implement stratified sampling that’s fine. But then you’ve replaced the statistical machinery with one that takes this balancing into account.

If on the other hand you just redo the sampling until it seems balanced then you’ve violated the assumptions behind the standard statistical tooling.


If the confounders are fat-tail distributed etc. then arbitrarily large samples can still be inadequate.

The idea that even thousands a data points in subgroups are going to be 'well mixed' relies on extremely strong assumptions about the distribution of those traits.


Just assumptions, not extremely strong assumptions. Ordinary least squares performs well in a large variety of cases.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: