There's likely no reason for GPT-4 to get worse, other than perception and some bad luck. I think the real problem is unevenness of its performance. I've had ChatGPT (3.5 and now 4) tell me it could help with problem X, then that it had no knowledge of it and couldn't help, to being an expert. All spaced out over several months.
It's likely that adequate prompt engineering would help to mitigate this problem.
It's likely that adequate prompt engineering would help to mitigate this problem.