Hacker News new | past | comments | ask | show | jobs | submit login

What is the correct way to instrument SCALABLE website analytics if you have a massively-trafficked (almost facebook-size) website?

Are there any resources on this topics?




You can sample, too. Cookie the user with a random number, like 0-1023. If you want a ~1% sample, wrap the analytics call in a Javascript if() statement where it's executed iff n < 10.

You can do this server-side, too. Record all of the data, but skim just a small percentage for a high level view of what's going on. You can always dig into the data later for a more in-depth question.


I assume the issue is building the backend to store all the data. I don't see why instrumenting a popular site is any different to instrumenting an unpopular site.

This is vastly different problem, which is mostly about distributed systems and algorithms.

Alex Smola's Hokusai paper is good place to start for analytics on truly massive traffic: http://www.auai.org/uai2012/papers/231.pdf


We built SnowPlow for exactly this:

https://github.com/snowplow/snowplow

Worth checking out if you're too big for e.g. Google Analytics free tier (10m pageviews+events a month)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: