I think these scenarios are compatible if we view LLMs as "fragile reasoners": they can occasionally reason, but it is an intermittent state that is easily disturbed. In such a world, we would expect to see that people who want LLMs to work can make them work with difficulty, and people who want or expect LLMs to fail can make them fail easily - or rather, maybe less adversarially phrased, one can generate examples of either outcome.
>> And yet the practitioners of CoT swear that any and every problem can be solved with LLM by giving it a bit of a CoT help.
For example, see this arxiv paper:
Generalized Planning in PDDL Domains with Pretrained Large Language Models
https://arxiv.org/abs/2305.11014
Where the authors conclude:
In this work, we showed that GPT-4 with CoT summarization and automated debugging is a surprisingly strong generalized planner in PDDL domains.
The author of the tweet is an expert on planning and he's responding to that kind of thing.