Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Maps are Broken, for Some Definition of Broken (sigusr2.net)
9 points by andreyf on June 22, 2010 | hide | past | favorite | 13 comments



Interesting fact: you can choose a random element from a sequence of unknown size in a single pass. The cost is having to always iterate over the entire sequence, and generating O(n) random numbers instead of O(1).

The algorithm is quite simple:

    var choice = elements.First()
    var n = 1
    for e in elements.Skip(1):
       n += 1
       if rand(n) == 0:
          choice = e
The algorithm is correct because, at each stage, each of the preceding elements is chosen with probability 1/n. The invariant is maintained because the next element is chosen with probability 1/(n+1) and therefore scales the preceding probabilities by n/(n+1) giving 1/(n+1).


HashMap is implemented as a dynamic array, and you could probably get "good enough" random performance from accessing the underlying array, picking a random element, then iterating over that bucket and picking a random element. In the average case this will be pretty much constant time.

I'd be a little worried about the randomness of the distribution, but if you need a proper random distribution, you shouldn't be using a map.


HashMap (unlike treeset) itself doesn't provide an order guarantee. So can't you just grab the first element from the entrySet and use that?

But I think the bigger problem is trying to use a Map in the first place when maybe a different data structure will solve the problem better.


I of course didn't talk about why I'm using a map, but as an example: consider redis. It's a giant hash table, and it allows you to pick a random key.


Just because it doesn't make any guarantees about order, doesn't mean you'll get a randomly-selected element if you pick the first one.


In his hypothetical "using imaginary iterator() on AbstractMap" O(n) solution, he's overlooking that the Set returned by keySet() has an iterator, and doesn't involve any reallocation/copying. So randomKey() can be:

  int index = random.nextInt(map.size());
  for(String key : map.keySet()) {
    if(index==0) return key;
    index--;
  }
Abstractly, maps aren't the right structure for quick/constant-time random-choice. He could possibly choose an internal implementation that was better at offering random-choice, but that's unlikely to be the optimal implementation for other map uses. He could maintain a parallel structure during inserts that allows constant-draws later. Or do one array shuffle after all inserts, before all draws (if that fits the usage pattern).

(edit: code corrected per replies below)


I did overlook that, so yes, this should work properly. Thanks!

One thing to note is that Random.next() is protected, so it's gotta be random.nextInt(map.size()).


Sorry to have referred to you in third-person; wasn't sure you were reading this thread. Thanks for the correction; I've changed my code fragment.


Maps have an entrySet() method that returns a set of key-value Entry objects you can iterate over. Is this what you were after?


keySet() is more appropriate. I just want the key.


If you want random elements, why don't just store it in a simple array?


Thanks everyone for the responses. I'm a Java n00b.


If you want random access to a map you should write your own implementation. This is a bit like complaining that random access to a linked list is too slow.

Choose appropriate data structures for the task at hand, no one data structure will fit every use.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: