I'm wondering if this was all in a single ChatGPT session where GPT-4 happened to answer the first question wrong. This could bias future outputs towards being wrong as well, and could explain the huge difference between OP's results and commenters, maybe?