What is the correct way to instrument SCALABLE website analytics if you have a m...

chaz · on Jan 25, 2013

You can sample, too. Cookie the user with a random number, like 0-1023. If you want a ~1% sample, wrap the analytics call in a Javascript if() statement where it's executed iff n < 10.

You can do this server-side, too. Record all of the data, but skim just a small percentage for a high level view of what's going on. You can always dig into the data later for a more in-depth question.

noelwelsh · on Jan 25, 2013

I assume the issue is building the backend to store all the data. I don't see why instrumenting a popular site is any different to instrumenting an unpopular site.

This is vastly different problem, which is mostly about distributed systems and algorithms.

Alex Smola's Hokusai paper is good place to start for analytics on truly massive traffic: http://www.auai.org/uai2012/papers/231.pdf

alexdean · on Jan 25, 2013

We built SnowPlow for exactly this:

https://github.com/snowplow/snowplow

Worth checking out if you're too big for e.g. Google Analytics free tier (10m pageviews+events a month)