If that one is not, are there any scientific studies then?
Strongly typed languages require of me to do more work upfront, to satisfy their type checker. They must necessarily reject programs that would work correctly. In this process a lot of mistakes are eliminated, and this gives me more confidence that the result will work. I like that way of working. But does it produce more robust code? Is it more productive? It feels like it, but that doesn't mean it's true.
"Strongly typed" is not the same thing as "statically typed". Most dynamically typed languages are strongly typed, too. The distinction between static and dynamic type systems comes from whether type errors are caught at compilation time or run time.
Which basically settles the question for me as a programmer, anyway. Eliminating the possibility of a class of run time failures -- how can that not be a good thing?
The question to me is not whether type checkers are useful tools, but at what point they become a hindrance. If I may rephrase your question: The programs rejected by the type checker, how can they not be bad programs?
TL.DR. Looks like FP reduces bugs, and static typed FP reduces them a bit more, but there isn't enough data for the more interesting fine-granied conclusions.
Anecdotes are case studies. In social sciences, for example, it's often the only thing you have. Do not dismiss this kind of evidence when you do not have any other options.