Even ChatGPT 3.5 can answer correctly if you ask just "She died at 11 PM. Was she alive at noon?". My theory is that this is an adversarial example that adds irrelevant information (bpm, blood pressure, heart rate) that the model could have given more attention than the relevant part of the question.