I don't know what system and user prompt you are testing with, but as one anecdote, Claude 3 Opus (and only Opus) consistently gives me better coding answers than GPT-4. Maybe it's the type of stuff I am doing or how I phrase things, who knows. I was using GPT-4 since the day it came out but haven't felt like going back so far.