Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think it's worth including a bit on disjoint sets. I've managed to pull that solution on interviewers a couple times and they end up fairly impressed when you beat their best solution which is O(n) storage O(log n) time, with one that's O(n) storage O(1) time after amortization.


can you elaborate?


Some material here: https://www.geeksforgeeks.org/union-find/

    A disjoint-set data structure is a data structure
    that keeps track of a set of elements partitioned
    into a number of disjoint (non-overlapping) subsets.
    A union-find algorithm is an algorithm that
    performs two useful operations on such a
    data structure:
    
    Find: Determine which subset a particular
    element is in. This can be used for determining
    if two elements are in the same subset.
    
    Union: Join two subsets into a single subset.


Sorry, I should have com back, but yeah, this is a good summary. When done properly both Find and Union can be done in amortized inverse-Ackerman time, which is for all intents and purposes constant. It can pop up as a solution for a surprising number of questions.

An old, common, (banned) interview question goes something like this:

> Imagine you have a grid of squares, represented however you want, describing an ocean. I'm going to give you coordinates one by one that indicate land being created. You need to tell me in total how many islands exist after each coordinate, where an island is defined as a contiguous block of land connected horizontally or vertically, but not diagonally (or include diagonals if you want; it makes some parts of the answer messier).

The disjoint-set answer to this question is that you look at the coordinate's neighbors. If there are none, this is a new island. Create the node as part of a new set where the node is the "representative" of that set. If there is one neighbor, create node but make its representative the same as the neighbor's representative. If there are more than one neighbors then you have to do a union; create your node and make its representative one of the neighbors' representatives. Then go to all of the other neighbors and find their representatives (make take a couple hops). Then make their final representatives use the first neighbor as their representative.

At this point you have kind of a directed acyclic graph where each node has exactly one edge (maybe pointing to itself). The last improvement is to update each node's representative as you travel to be the final one. This amortizes out to a single hop for each node, after enough find or union operations.

Then you can start adding more data to the representative and dealing with merging that to solve some pretty complex problems in a shockingly low runtime.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: