Yup, the models are smart, but are trained to follow standard human patterns for...

Yup, the models are smart, but are trained to follow standard human patterns for this type of questions. And even on hackernews vast majority will not think that they would need to correct for buoyancy when actually attempting the experiment in standard conditions.

They very often get popular "tricky" questions wrong because they saw it so many times that they switch from internal reasoning to memorization/retrieval.