It may console you: in some sense, the top 4900 are more valuable than the top-100.
Why? Everybody here knows Paul Graham. I know Krebs and Schneier, most of you will, too. In a long tail distribution like this, the top entries (left) are the obvious ones, the lowest frequented ones (right) might be noise (artifact of the methods e.g. bugs in the data cleaning), but the middle part is really where the value is: blogs we don't know but would like to know.
In search engine ranking, people needed a lot of time until the late Karen Spärck Jones finally discovered IDF (inverse document [collection] frequency) in 1972, the "Yang" to raw term frequency (TF), which had been the "Yin" that was missing a counterforce to retrieve truly relevant documents when balanced in the TFIDF formula.
So, plea to the OP: please release the rest of your list (101-100000).
+1 to this. I'd also argue that some on the list are unapologetic self-promoters like Simon Willison. Nothing wrong with it but it shows and I think it's much more impressive to be below that cohort but still only a reasonable distance away.
Why? Everybody here knows Paul Graham. I know Krebs and Schneier, most of you will, too. In a long tail distribution like this, the top entries (left) are the obvious ones, the lowest frequented ones (right) might be noise (artifact of the methods e.g. bugs in the data cleaning), but the middle part is really where the value is: blogs we don't know but would like to know.
In search engine ranking, people needed a lot of time until the late Karen Spärck Jones finally discovered IDF (inverse document [collection] frequency) in 1972, the "Yang" to raw term frequency (TF), which had been the "Yin" that was missing a counterforce to retrieve truly relevant documents when balanced in the TFIDF formula.
So, plea to the OP: please release the rest of your list (101-100000).