Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

They are probably just checking headers such as user agent and cookies. Would copy whatever your normal browser sends and put it in the urllib.request. If that doesn’t work, then it is likely more sophisticated.


I will try that, but a quick look at the error page makes me think it tries to run a javascript blob.


They're just checking the user agent

    $ curl -s -I 'https://www.sfgate.com/' -H 'User-Agent: curl/7.54.1' | head -1
    HTTP/2 403
    
    $curl -s -I 'https://www.sfgate.com/' -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/113.0' | head -1   
    HTTP/2 200  
One "trick" is that Firefox (and I assume Chrome?) allow you to copy a request as curl - then you can just see if that works in the terminal, and if it does you can binary search for the required headers.


It probably does. But there are better modern tools like headless Chrome / Puppeteer that can fully render a page with scripts.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: