If you look at the Leetcode scores, it looks like GPT-4 can generally do most "basic" leetcode but fails on "medium" or "hard" problems. This seems to align with what I see most people's experience with using GPT-3/3.5/4 to generate code seems to be. Works well for simple cases (which you could probably find examples of online) but stumbles on nuances of incrementally harder problems.