I was trawling studies for some issues of my own and sort of independently disco...

I was trawling studies for some issues of my own and sort of independently discovered this many years ago. It's very easy for an intervention to be life saving for 5%, pretty good for 10%, neutral for %84, and to have some horrible effect for %1, and that tends to average out to some combination of "not much effect", "not statistically significant", and depending on that 1% possible "dangerous to everyone". (Although with the way studies are run, there's a certain baseline of "it's super dangerous" you should expect because studies tend to run on the assumption that everything bad that happened during them was the study's fault, even though that's obvious not true. With small sample sizes this can not be effectively "controlled away".) We need some measure that can capture this outcome and not just neuter it away, because I also found there were multiple interventions that would have this pattern out outcome. Yet they would all be individually averaged away and the "official science consensus" was basically "yup, none of these treatments 'work'", resulting in what could be a quite effective treatment plan for some percentage of the population being essentially defeated in detail [1].

What do you mean? They all "work". None of them work for everyone, but that doesn't mean they don't work at all. As the case I was looking at revolved around nutritional deficiencies (brought on by celiac in my case) and their effects on the heart, it is also the case that the downside of the 4 separate interventions if it was wrong was basically nil, as were the costs. What about trying a simple nutritional supplement before we slam someone on beta blockers or some other heavy-duty pharmaceutical? I'm not against the latter on principle or anything, but if there's something simpler that has effectively no downsides (or very, very well-known ones in the cases of things like vitamin K or iron), let's try those first.

I think we've lost a great deal more to this weakness in the "official" scientific study methodology than anyone realizes. On the one hand, p-hacking allows us to "see" things where they don't exist and on the other this massive, massive overuse of "averaging" allows us to blur away real, useful effects if they are only massively helpful for some people but not everybody.

[1]: https://en.wikipedia.org/wiki/Defeat_in_detail