Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't know about archive.is, but 12ft.io does identify as google to bypass paywalls afaik


12ft.io also doesn't work or is disabled for many sites that archive.is still works on


Maybe because the creator of 12ft.io isn't anonymous


Wouldn't sites be able to see that requests from 12ft.io isn't coming from Google's IPs?


Yes.

Google recommends using reverse DNS to verify whether a visitor claiming to be Googlebot is legitimate or not: https://developers.google.com/search/docs/crawling-indexing/...

You can also verify IP ownership using WHOIS, or by examining BGP routing tables to see which ASN is announcing the IP range. Google also publishes their IP address ranges here: https://www.gstatic.com/ipranges/goog.json


https://search.google.com/test/rich-results?url= operates from legit Googlebot IPs so it allows anyone to get the paywalled content even archive.is fails to fetch (from theinformation.com, for example)


"Google recommends using reverse DNS to verify..."

This is almost right. They recommend two steps:

1. Use reverse DNS to find the hostname the IP address claims to have. (The IP address block owner can put any hostname in here, even if they don't own/control the domain.)

2. Assuming the claimed hostname is on one of Google's domains, do a forward DNS lookup to verify that the original IP address is returned.

The second step is the important one.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: