Maps are Broken, for Some Definition of Broken

Strilanc · on June 22, 2010

Interesting fact: you can choose a random element from a sequence of unknown size in a single pass. The cost is having to always iterate over the entire sequence, and generating O(n) random numbers instead of O(1).

The algorithm is quite simple:

    var choice = elements.First()
    var n = 1
    for e in elements.Skip(1):
       n += 1
       if rand(n) == 0:
          choice = e

The algorithm is correct because, at each stage, each of the preceding elements is chosen with probability 1/n. The invariant is maintained because the next element is chosen with probability 1/(n+1) and therefore scales the preceding probabilities by n/(n+1) giving 1/(n+1).

sprout · on June 22, 2010

HashMap is implemented as a dynamic array, and you could probably get "good enough" random performance from accessing the underlying array, picking a random element, then iterating over that bucket and picking a random element. In the average case this will be pretty much constant time.

I'd be a little worried about the randomness of the distribution, but if you need a proper random distribution, you shouldn't be using a map.

billnapier · on June 22, 2010

HashMap (unlike treeset) itself doesn't provide an order guarantee. So can't you just grab the first element from the entrySet and use that?

But I think the bigger problem is trying to use a Map in the first place when maybe a different data structure will solve the problem better.

apgwoz · on June 22, 2010

I of course didn't talk about why I'm using a map, but as an example: consider redis. It's a giant hash table, and it allows you to pick a random key.

nervechannel · on June 23, 2010

Just because it doesn't make any guarantees about order, doesn't mean you'll get a randomly-selected element if you pick the first one.

gojomo · on June 22, 2010

In his hypothetical "using imaginary iterator() on AbstractMap" O(n) solution, he's overlooking that the Set returned by keySet() has an iterator, and doesn't involve any reallocation/copying. So randomKey() can be:

  int index = random.nextInt(map.size());
  for(String key : map.keySet()) {
    if(index==0) return key;
    index--;
  }

Abstractly, maps aren't the right structure for quick/constant-time random-choice. He could possibly choose an internal implementation that was better at offering random-choice, but that's unlikely to be the optimal implementation for other map uses. He could maintain a parallel structure during inserts that allows constant-draws later. Or do one array shuffle after all inserts, before all draws (if that fits the usage pattern).

(edit: code corrected per replies below)

apgwoz · on June 22, 2010

I did overlook that, so yes, this should work properly. Thanks!

One thing to note is that Random.next() is protected, so it's gotta be random.nextInt(map.size()).

gojomo · on June 22, 2010

Sorry to have referred to you in third-person; wasn't sure you were reading this thread. Thanks for the correction; I've changed my code fragment.

nervechannel · on June 22, 2010

Maps have an entrySet() method that returns a set of key-value Entry objects you can iterate over. Is this what you were after?

apgwoz · on June 22, 2010

keySet() is more appropriate. I just want the key.

lutorm · on June 22, 2010

If you want random elements, why don't just store it in a simple array?

apgwoz · on June 22, 2010

Thanks everyone for the responses. I'm a Java n00b.

InclinedPlane · on June 22, 2010

If you want random access to a map you should write your own implementation. This is a bit like complaining that random access to a linked list is too slow.

Choose appropriate data structures for the task at hand, no one data structure will fit every use.