But proving that it can't reason about _any number_ of problems doesn't prove that it can't reason. It doesn't matter how many negative cases there are if there's a _single_ positive case.
You can observe any number of white swans and that will never be proof that black swans do not exist, but a single observation of a black swan does prove that they exist.
Proving that it repeatedly fails at multiple classes of reasoning problems is much harder evidence than positive examples that seem right.