Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It looks like they try to compensate for that, per their FAQ page:

> For the surveys, we count the top 10 million websites according to Alexa and Tranco, see our technology overview for more explanations. We do crawl more sites, but we use the top 10 million to select a representative sample of established sites. We found that including more sites in the sample (e.g. all the sites we know) may easily lead to a bias towards technologies typically used for "throw-away" sites or parked sites or other types of spam domains.



There are content-writing AI's which you almost cannot tell is not written by a human, I would be surprised if an automated crawler would be able to tell the difference when humans barely can.

I've even seen agencies use these tools regularly, which makes it possible to spew out several sites per day.


And how does that get them into the top sites per Alexa and Tranco, per the part I quoted?


Agencies wouldn't use it if it didn't rank on search engines.

If the sites where of specific categories where user-interaction is required (login required), it would give a hugely different image.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: