I lack the in-depth knowledge but correcting p-value thresholds for multiple hyp...

bogtog · on Oct 4, 2024

Correcting for p-values in brain imaging research is kinda elaborate, since you are essentially performing a t-test separately for each brain voxel. A brain image like the authors' will have about a million voxels (0.7 x 0.7 x 0.7 mm voxels). For multiple hypothesis correction, correcting for 1 million tests would be overly strict because neighboring voxels are highly correlated with one another, and it's unlikely for brain effects to really be confined to such a small area. Hence, researchers usually define a primary threshold for voxels (here p < .001 uncorrected), and then look for patches of many p < .001 voxels together. Here, the authors stated that they looked for patches of at least 50 contiguous voxels. The authors are just using some loose old-timey heuristic without justification or citation. These have been getting phased out mostly in the past decade. These types of heuristics don't actually test to establish that these thresholds won't yield tons of false positives (one of the best ways to do this is to basically randomly shuffle your data and see what are the actual cluster sizes generated by chance).

> some basic error like this would never pass peer review

It indeed shouldn't pass peer review! Yet, here we are. I think standards have gotten better since the paper's publication (2018), but there are no doubt there are still many reviewers who don't have a good intuition about what a significant cluster size should be. Off the top of my head, I can't give an exact number on the cluster size needed, but I'd be willing to bet a ton that what the authors used is not enough.