Devs making baby’s first mobile app add “request location information” permissions, the devices start giving them the phone’s GPS information in the form of lat/lon pairs, and those devs naturally look for a service to make that data useful. What they want is “reverse geolocation”, i.e. take a lat/lon pair and return information that makes sense to a human (country, state, nearby street address, etc).
This is a service that OpenCage provides, and for whatever reason OpenCage happens to be one of the popular services for this use case. (Maybe it’s because you get the text description of location back right away without having to do a round trip through a heavyweight on-screen map, maybe their free tier allows more requests than most, maybe their api is easier to use, maybe they are lucky or skilled with SEO and their tutorial happens to be the first result for some common phrases, who knows.)
So there’s this process that starts with a search for “convert phone location to address”, often involves the OpenCage api, and ends with a happy developer getting the information they wanted. Various algorithms pick up on the existence and repeated traversal of this happy path.
In another part of the internet, code tutorial content farms notice a demand for determining an incoming call’s location from the number that’s calling. They search for things like “convert phone number to location” and “convert phone number to address”. Some of these searches end up falling into the nearby well-trodden path of “convert phone location to address” and the content farmer is presented with the OpenCage api. They mess around with the api for a bit and find they can start from a phone number and get a successful api call that returns a lat/lon pair. A successful api call that returns legitimate-looking lat/lon data is all they need to make a video, they make it and post it. Higher-quality, more scrupulous code tutorials attempt to answer this same demand but find it’s not possible, so those tutorials don’t get made, leaving the less scrupulous ones that stop with a successful-looking api call to flourish in this space. The tutorial is doing well, so the content farms endlessly recycle it into blogspam.
As a result, OpenCage starts getting weird usage patterns, tracks them down, finds the source is these tutorials, and makes a post about it.
Some time later, ChatGPT is released. People are astounded with its ability to write code and start using it for this purpose. Naturally, some of those people have the same demand as the previous generation of devs who stumbled onto the unscrupulous code tutorials. Because of the blogspam, ChatGPT’s training data includes many variations on the tutorial, and just as naturally it ends up reproducing that tutorial when asked - except ChatGPT’s magic kicks in and instead of including (what its embeddings see as) some weird unrelated area-code-to-string nonsense from the tutorial, it just bullshits some plausible-sounding data plumbing code instead. Unfortunately, because the tutorial never worked in the first place, that weird hacky irrelevant bit that ChatGPT ignored happened to be the secret sauce that makes the whole thing superficially appear to work.
As a result, OpenCage starts getting weird usage patterns, tracks them down, finds the source is ChatGPT, and makes a post about it.
In deference to Hacker News’ policy of keeping comments pleasant, I will elide the analysis of the process that leads to comments accusing OpenCage of nefariously engineering the whole thing for attention.
Thanks for the above. (nice self-restraint in the last paragraph.) Things almost make sense now. Except one problem ... this implies that there are software developers who think to themselves "given a cell phone number, how can I get the phone's location?".
And it further implies that these people don't immediately follow that thought with: "That's surely impossible, since it would be a privacy nightmare if literally everyone in the world could track everyone else in the world's every move".
Or perhaps with this alternative thought, which would lead to the same conclusion: "let's not worry about privacy, how would this even work? Does every phone company in the world pro-actively send every customer's location data to OpenCage, just in case someone queries it? Or does OpenCage wait until it gets a query, and only then query the cell phone company 'just-in-time'? Both of these sound like a lot of work for each phone company to support ... what's the incentive?"
Honestly, I'm a bit surprised that the OpenCage blog post is so calm about this, instead of just yelling incoherently "why WHY why would anyone think like this?!?"
This is a service that OpenCage provides, and for whatever reason OpenCage happens to be one of the popular services for this use case. (Maybe it’s because you get the text description of location back right away without having to do a round trip through a heavyweight on-screen map, maybe their free tier allows more requests than most, maybe their api is easier to use, maybe they are lucky or skilled with SEO and their tutorial happens to be the first result for some common phrases, who knows.)
So there’s this process that starts with a search for “convert phone location to address”, often involves the OpenCage api, and ends with a happy developer getting the information they wanted. Various algorithms pick up on the existence and repeated traversal of this happy path.
In another part of the internet, code tutorial content farms notice a demand for determining an incoming call’s location from the number that’s calling. They search for things like “convert phone number to location” and “convert phone number to address”. Some of these searches end up falling into the nearby well-trodden path of “convert phone location to address” and the content farmer is presented with the OpenCage api. They mess around with the api for a bit and find they can start from a phone number and get a successful api call that returns a lat/lon pair. A successful api call that returns legitimate-looking lat/lon data is all they need to make a video, they make it and post it. Higher-quality, more scrupulous code tutorials attempt to answer this same demand but find it’s not possible, so those tutorials don’t get made, leaving the less scrupulous ones that stop with a successful-looking api call to flourish in this space. The tutorial is doing well, so the content farms endlessly recycle it into blogspam.
As a result, OpenCage starts getting weird usage patterns, tracks them down, finds the source is these tutorials, and makes a post about it.
Some time later, ChatGPT is released. People are astounded with its ability to write code and start using it for this purpose. Naturally, some of those people have the same demand as the previous generation of devs who stumbled onto the unscrupulous code tutorials. Because of the blogspam, ChatGPT’s training data includes many variations on the tutorial, and just as naturally it ends up reproducing that tutorial when asked - except ChatGPT’s magic kicks in and instead of including (what its embeddings see as) some weird unrelated area-code-to-string nonsense from the tutorial, it just bullshits some plausible-sounding data plumbing code instead. Unfortunately, because the tutorial never worked in the first place, that weird hacky irrelevant bit that ChatGPT ignored happened to be the secret sauce that makes the whole thing superficially appear to work.
As a result, OpenCage starts getting weird usage patterns, tracks them down, finds the source is ChatGPT, and makes a post about it.
In deference to Hacker News’ policy of keeping comments pleasant, I will elide the analysis of the process that leads to comments accusing OpenCage of nefariously engineering the whole thing for attention.