> The authors' claims about BDNF are supported by a p-value of p = .046, and having main conclusions hinge on p-values of p > .01 usually means the conclusions are rubbish.
Your other points I agree with but I actually think the BDNF result has some standing. I'm looking at Fig. 4 and just by an eyeball test there's clear difference in the distributions of the intra-individual BDNF increase. It's not like there's some miniscule variation in the means that they make significant with really small error bars, the actual effect size appears to be notable. Moreover there's clearly some effect on the width of the distributions which could support their ultimate conclusion (ex. even if the mean effect is the same, it's possible the population-wide ceiling on gains is higher for dance).
Now with all that said this could definitely still be a multiple comparisons thing, I'm just a statistician with no background in neuro-stuff so possibly the BDNF thing is just a bad indicator here. Certainly the behavioral outcomes not showing an interaction difference isn't a great sign as you point out, but in my personal and unimportant opinion I would at least say this study would be good justification for a follow-up with a better design and bigger population.
I know nearly nothing about BDNF specifically. Whether it should motivate a follow-up is mostly only something known to the authors, as a p = .046 suggests a chance they may have tested numerous outcome variables and reported only one (e.g,. this could very well be 1/10). The fact that the p-value is almost comically close to p = .05, makes me suspect that this happened. Perhaps, if this goes in line with other BDNF research, then that could motivate it some further work.
Notably, even if we take this p = .046 as a given, and assume there was no p-hacking, then this type of result implies that statistical power is tiny, and a proper "bigger population" study would likely have to be several hundreds of people. Even a study with 50% power, should have a majority of significant results land p < .01.
Agree that this is definitely an assumption one needs to make, could easily be that BDNF was one variable among many unreported ones, and this case would be consistent with the other outcome variables in the paper so seems plausible.
> this type of result implies that statistical power is tiny,
Yes, definitely, BUT the effect in question is an interaction effect so yeah, power's just going to be small from the nature of the design. I was definitely thinking that you'd be looking at a follow up study of the size of multiple hundreds to confirm something like this. I'm realizing that thinking this is a trivial follow-up is is the difference between someone actually might work on real experiments and someone who just works with the numbers.
Just want to re-emphasize though that the thing which makes me give this result (some) credence (assuming it's not a desk drawer p-hack) is just the distributions of the observation variable for the two treatment groups. Like even if the means of the BDNF increase are equal between the two arms of the trial, and this p-value is a false pos (which as you say, seems very possible), there's still clearly some other differences between the groups. I strongly suspect a quantile regression on the p50 or p75, rather than an ANOVA on the means, would show a 'more significant' effect; heck even just a log-linear model or something seems like it would be an improvement since there's clearly some skew in the 'Dance' population.
Your other points I agree with but I actually think the BDNF result has some standing. I'm looking at Fig. 4 and just by an eyeball test there's clear difference in the distributions of the intra-individual BDNF increase. It's not like there's some miniscule variation in the means that they make significant with really small error bars, the actual effect size appears to be notable. Moreover there's clearly some effect on the width of the distributions which could support their ultimate conclusion (ex. even if the mean effect is the same, it's possible the population-wide ceiling on gains is higher for dance).
Now with all that said this could definitely still be a multiple comparisons thing, I'm just a statistician with no background in neuro-stuff so possibly the BDNF thing is just a bad indicator here. Certainly the behavioral outcomes not showing an interaction difference isn't a great sign as you point out, but in my personal and unimportant opinion I would at least say this study would be good justification for a follow-up with a better design and bigger population.