Strings are going to be bad anyway because there's a high chance you're hitting a cache miss just to get access to the heap allocated string data.
Yes, only the histogram pass has random writes and if you pick your bucket size appropriately(like I mentioned above) then the surface area for those random writes can land within a cache line or two.
Summing the histograms are a very nice linear read/write pass.
Okay well that changes things a bit. I was thinking on general data, like strings.
> The sort then becomes a linear walk of the data doing the standard Radix sort.
Does it? What about all the buckets which you write randomly to?