Not specifically related to archive.is, but news sites have a tightrope to walk.
They need to both allow the full content of their articles to be accessed by crawlers so they can show up in search results, but they also want to restrict access via paywalls. They use 2 main methods to achieve this: javascript DOM manipulation and IP address rate limiting.
Conceivably one could build a system which directly accesses a given document one time from a unique IP address and then cache the HTML version of the page for further serving.
They need to both allow the full content of their articles to be accessed by crawlers so they can show up in search results, but they also want to restrict access via paywalls. They use 2 main methods to achieve this: javascript DOM manipulation and IP address rate limiting.
Conceivably one could build a system which directly accesses a given document one time from a unique IP address and then cache the HTML version of the page for further serving.