Devs making baby’s first mobile app add “request location information” permissio...

Devs making baby’s first mobile app add “request location information” permissions, the devices start giving them the phone’s GPS information in the form of lat/lon pairs, and those devs naturally look for a service to make that data useful. What they want is “reverse geolocation”, i.e. take a lat/lon pair and return information that makes sense to a human (country, state, nearby street address, etc).

This is a service that OpenCage provides, and for whatever reason OpenCage happens to be one of the popular services for this use case. (Maybe it’s because you get the text description of location back right away without having to do a round trip through a heavyweight on-screen map, maybe their free tier allows more requests than most, maybe their api is easier to use, maybe they are lucky or skilled with SEO and their tutorial happens to be the first result for some common phrases, who knows.)

So there’s this process that starts with a search for “convert phone location to address”, often involves the OpenCage api, and ends with a happy developer getting the information they wanted. Various algorithms pick up on the existence and repeated traversal of this happy path.

In another part of the internet, code tutorial content farms notice a demand for determining an incoming call’s location from the number that’s calling. They search for things like “convert phone number to location” and “convert phone number to address”. Some of these searches end up falling into the nearby well-trodden path of “convert phone location to address” and the content farmer is presented with the OpenCage api. They mess around with the api for a bit and find they can start from a phone number and get a successful api call that returns a lat/lon pair. A successful api call that returns legitimate-looking lat/lon data is all they need to make a video, they make it and post it. Higher-quality, more scrupulous code tutorials attempt to answer this same demand but find it’s not possible, so those tutorials don’t get made, leaving the less scrupulous ones that stop with a successful-looking api call to flourish in this space. The tutorial is doing well, so the content farms endlessly recycle it into blogspam.

As a result, OpenCage starts getting weird usage patterns, tracks them down, finds the source is these tutorials, and makes a post about it.

Some time later, ChatGPT is released. People are astounded with its ability to write code and start using it for this purpose. Naturally, some of those people have the same demand as the previous generation of devs who stumbled onto the unscrupulous code tutorials. Because of the blogspam, ChatGPT’s training data includes many variations on the tutorial, and just as naturally it ends up reproducing that tutorial when asked - except ChatGPT’s magic kicks in and instead of including (what its embeddings see as) some weird unrelated area-code-to-string nonsense from the tutorial, it just bullshits some plausible-sounding data plumbing code instead. Unfortunately, because the tutorial never worked in the first place, that weird hacky irrelevant bit that ChatGPT ignored happened to be the secret sauce that makes the whole thing superficially appear to work.

As a result, OpenCage starts getting weird usage patterns, tracks them down, finds the source is ChatGPT, and makes a post about it.

In deference to Hacker News’ policy of keeping comments pleasant, I will elide the analysis of the process that leads to comments accusing OpenCage of nefariously engineering the whole thing for attention.