You can sample, too. Cookie the user with a random number, like 0-1023. If you want a ~1% sample, wrap the analytics call in a Javascript if() statement where it's executed iff n < 10.
You can do this server-side, too. Record all of the data, but skim just a small percentage for a high level view of what's going on. You can always dig into the data later for a more in-depth question.
I assume the issue is building the backend to store all the data. I don't see why instrumenting a popular site is any different to instrumenting an unpopular site.
This is vastly different problem, which is mostly about distributed systems and algorithms.
Are there any resources on this topics?