Kruskal and the Disjoint Sets

Given a connected and undirected graph, a spanning tree of that graph is a tree that connects all the vertices together. Given that a weight is assigned to every edge, the spanning tree’s weight is computed by summing the weights of the edges within that spanning tree. A minimum spanning tree (MST) is a spanning tree with weight less than or equal to the weight of every other spanning tree of that graph.

Let’s first consider a simple graph definition. For our purposes, we can simply define a graph as a collection of edges between two vertices u and v. Let’s represent each vertex as a simple integer, and let’s assign a weight to each edge.

struct Edge {
    int weight;
    int u;
    int v;

    Edge(int a, int b, int w) : u(a), v(b), weight(w) {}

    bool operator<(const Edge& other) const {
        if (weight == other.weight) {
            return (u+v+weight) < (other.u + other.v + other.weight);
        } else {
            return weight < other.weight;
        }
    }
};

To find the MST of a graph, we can look at the minimum spanning forest of a graph. The minimum spanning forest is a union of the MSTs of its connected components. Therefore, to find MST, we first want to create a forest.

UnionSet<int> *vertices[N+1] = { };
for (int i = 1; i <= N; i++) {
    vertices[i] = new UnionSet<int>(i);
}

Then, we want to go through the edges in sorted order, and merge the sets if the vertices are not in the same sets.

std::sort(edges.begin(), edges.end());
UnionSet<int> *start = vertices[0];
std::set<Edge> minimum_spanning_tree;
for (auto it = edges.begin(); it != edges.end(); it++) {
    int u = it->u;
    int v = it->v;
    if (!vertices_are_in_same_sets(vertices[u], vertices[v])) {
        merge(vertices[u], vertices[v]);
        minimum_spanning_tree.insert(*it);
    }
}

It is then simple to compute the MST’s weight:

int min_span_tree_length = 0;

std::for_each(minimum_spanning_tree.begin(), minimum_spanning_tree.end(), [&] (Edge edge) {
        min_span_tree_length += edge.weight;
        });

Of course, the complexity lies in the merge and vertices_are_in_same_sets functions. A simple way to implement them using linked-lists. This way, union, or merge, is really easy to do. However, the check is expensive since every nodes in the linked-list must be checked.

This process can be improved by using disjoint sets. Disjoint sets implement two methods. find and union, or merge. The find method returns the root of the tree.

struct UnionSet {
    T data;
    UnionSet<T> *parent;

    UnionSet(T value) : data(value), parent(this) {
    }

    UnionSet<T> *find() {
        UnionSet<T> *x = this;
        if (this->parent == this) {
            return this;
        }
        return x->parent->find();
    }

    void merge(UnionSet<T> &other) {
        UnionSet<T> *xRoot = this->find();
        UnionSet<T> *yRoot = other.find();
        xRoot->parent = yRoot;
    }
};

Of course, this implementation is no better than the linked list one since the disjoint sets can become quite unbalanced. This issue can be mitigated by using two techniques; union by rank, and path compression.

Union by rank is a way to merge the two sets by always attaching the smaller tree to the larger one. This way, the depth of the resulting tree only increases if the two trees had the same depth. The rank represents the tree’s depth and a one-element disjoint set has a rank of 0. Using this optimization, the find and merge method yield a worst-case running-time of O(log n).

void merge(UnionSet<T> &other) {
    UnionSet<T> *xRoot = this->find();
    UnionSet<T> *yRoot = other.find();

    // Merge them
    if (xRoot->rank < yRoot->rank) {
        xRoot->parent = yRoot->parent;
    } else {
        yRoot->parent = xRoot;
        yRoot->rank = xRoot->rank + 1;
    }
}

Path compression can then be used to further improve the running-time of the algorithm. Since each node visited on the way to the root node could be directly attached to the root node without the tree losing any value, the find method changes each node’s parent reference to the root as it recursively traverses up the tree. This way, the resulting path is much flatter.

UnionSet<T> *find() {
    UnionSet<T> *x = this;
    if (x->parent->data != x->data) {
        x->parent = x->parent->find();
    }
    return x->parent;
}

The use of these two optimizations brings the amortized running-time of the find and merge functions to O(α(n)), where α(n) is the inverse Ackermann function, which grows really slowly. In fact, α(n) is less than 5 for almost any practical values of n.

Related Posts

What is the Maker-Maker fee structure? 15 Mar 2020

Postgresql index usage 28 Feb 2019

Mock within a mock 19 Sep 2015