I tried this the other day with a picture of my dog in a field in a park by a river with EXIF stripped. It gave me a list of parks on the correct river. There were really no other features other than the foliage and water. Seems like magic to me, I don't really understand how it's possible to be that accurate.
Unless you were running the model locally in a way you control, it seems very possible that the model has additional hidden context regarding your location (perhaps based on IP, perhaps based on credit card billing information, etc).
ChatGPT doesn't let you share chats with images (why?), but here is what o3 said for an image with memory turned off:
"Thought for 11 seconds
It’s hard to pin down an exact spot from this single photo, but a few visual clues narrow the possibilities:
Temperate-zone trees: Most of the trees have lost their leaves, but scattered white-pines or spruces remain green. That mix—deciduous hardwoods plus a sprinkling of conifers—is typical of the northeastern United States and southeastern Canada.
Broad, moderately fast river: The channel looks 75-100 m wide with swirling surface currents but no large rapids, suggesting a mid-sized river in its lower reaches—something like the Merrimack, Connecticut, Delaware, or Susquehanna rather than a small mountain stream.
Late-winter / early-spring light: Bare branches and a clear blue sky with fairly high sun angle hint at March–April in that latitude range.
So my best guess is that you’re standing on the bank of a northeastern U.S. river—perhaps somewhere in New England or the Mid-Atlantic—during late winter or early spring. If you can share any nearby landmarks, town names, or additional photos, I might be able to zero in more precisely."
With memory turned on:
"Thought for 5 seconds
Beautiful spot! From what I can see:
a broad, slow‑moving river with a sandy/rocky bank in the foreground
bare deciduous trees mixed with some white pines on the far shore (so it’s probably early spring or late autumn)
no mountains on the horizon, just gently rolling terrain typical of low‑lying floodplains in the U.S. Northeast
That mix of hardwoods and pines, plus the flat, braided water surface, reminds me a lot of the Merrimack or Concord Rivers in Massachusetts or southern New Hampshire—both have public footpaths where you can stand right at the water’s edge like this.
Of course, many temperate rivers look similar. If you can give me a hint (e.g., the town or nearby landmark), I can narrow it down further—maybe even point you to trail maps or history of the exact stretch you photographed."
Southern NH is correct, and I am certain it is drawing from memory/past chats. However, I can't replicate a specific behavior I once had, which is in temporary chat (no past chat/memory enabled), it said that it guessed where the photo was taken based on my location.
"ChatGPT doesn't let you share chats with images (why?)"
Probably because if you uploaded pornography (or illegal imagery) to ChatGPT and then shared a link with the world it would be embarrassing for OpenAI.
My guess is that OpenAI are risk averse on this particular issue, because people could come up with some very creative awful scenarios with ChatGPT and image analysis. "Which of these people looks the most stupid" plus a jailbreak would be instant negative press.