Home » Blog » Disjoint Sets Data Structure

# Disjoint Sets Data Structure

A Disjoint set which can be viewed as a collection of sets where in no element appears in more than one set.

A disjoint set data structure is used to maintain a collection S = {S1, S2, S3, …, Sn} of disjoint dynamic set. We can identify each set by a representative, which is some member of the set. Disjoint set data structure is used to find the MST(Minimum Spanning Tree). Disjoint set is a partitioned set.

Partitioned means union of any two sets is gives the original set while intersection of any two set gives a null set.

Disjoint set is useful for detecting a cycle in non directed graph. In some application, it doesn’t matter that which member is used as the representative, but when we request for the representative twice in dynamic set without modifying the set in between these two request, the answer must be the same for both the times. While some other application required the prespecified rule for choosing the representative such as choosing the smallest member in a set.

Disjoint set is somewhat similar to the sets in mathematics, but it is different in context of algorithm. Kruskal’s algorithm uses the disjoint set data structure which detects cycle in a graph.

In the following figure there is two component of the undirected graph which can be represented in form of set as below.

S1 = {1, 2, 3, 4}

S2 = {5, 6, 7, 8}

So we can see that in this two sets there is no any common item. As we discussed earlier disjoint set has two propery.

1. Union gives original set
2. Intersection gives null set

So perform both operation on a given sets and see the property is satisfied or not.

S1 ∪ S2 = {1, 2, 3, 4} ∪ {5, 6, 7, 8}

= {1, 2, 3, 4, 5, 6, 7, 8} (Original Set)

S1 S2 = {1, 2, 3, 4} {5, 6, 7, 8}

= {} (Null Set)

## Operations on Disjoint Set

The most common operation we can perform on disjoint set is find and union.

Make_set(x)

This operation creates the new set whose only member is x. As we are using disjoint set, we have to verify that the x is not already in any of the given set.

Find_set(x)

This operation return the set in which the element x belongs.

Find operation is used to know that the particular element is belongs to which set. Let’s take above example if we want to find that the element 4 is belongs to which set then by traversing the set we can know that the 4 is belongs to S1 set. It is also known as set membership operation means that element is member of which set.

Union(x,y)

This operation unites the dynamic sets in which x and y belongs. It will generate the new set that contains the sets in which x and y belongs. As we are using disjoint set we have to destroy the individual set of element x and y after performing union operation.

Now consider above example that we want to connect two component then we have to perform the union operation on them. In above example if we want to add one edge between 4 and 8 then we have to perform find 4 and find 8 and if they belongs to different sets then it will perform union(4,8).

## How to detect cycle in a Graph

After adding the edge between 4 and 8 our set is looks like S3 = {1, 2, 3, 4, 5, 6, 7, 8}.

Now we want to add another edge call (1,5), Now let’s perform find 1 and find 5 and we are going to know that these both belongs to the same set, it means there is a cycle in a graph. By using this way we can detect the cycle in a graph.

Before going to see the application of disjoint set, we first have to understand each operation of disjoint set. So let’s take one example that will help to understand the procedure.

Let’s consider the below undirected graph which have 8 vertices and 9 edges in which each edges have given weight. This example gives the clear insight of how to find the cycle from a graph. Universal set of the given graph is U = {1, 2, 3, 4, 5, 6, 7, 8}. First we have to know that each vertex is consider as a individual set. So now we have to include the edges one by one and have to form a set, we will take edges that numbered in increasing order.

So first edge is (1,2) and now we have to find(1) and find(2). So both are in universal set so we have to make a set of 1 and 2 and these both will be removed from the universal set. So we have following information.

U = {3, 4, 5, 6, 7, 8}

S1 = {1, 2}

Next edge is (3,4) again both are in universal set so make set of both and removed from the universal set.

U = {5, 6, 7, 8}

S1 = {1, 2}

S2 = {3, 4}

Third edge is (5,6)

U = {7, 8}

S1 = {1, 2}

S2 = {3, 4}

S3 = {5, 6}

Fourth edge is (7,8)

U = {}

S1 = {1, 2}

S2 = {3, 4}

S3 = {5, 6}

S4 = {7,8}

Now the next edge is (2,4) and we know that the universal set is empty. So now we have to again doing the same thing. find(2) and find(4) and these both belongs to different sets that is S1 and S2, so we have to perform union for them. So now we have,

S3 = {5, 6}

S4 = {7,8}

S5 = {1, 2, 3, 4}

Next edge is (2,5) and now find(2) and find(5) gives S5 and S3 so again we have to perform union.

S4 = {7,8}

S6 = {1, 2, 3, 4, 5, 6}

Now seventh edge is (1,3). Now find(1) and find(3) gives S6. It means that both belongs to the same set and there is cycle. It means if we include the seventh edge then the cycle is formed so we are not going to include it. Now let’s continue with next edge that is (6,8). find(6) and find(8) gives S6 and S4 so we have to perform the union operation.

S7 = {1, 2, 3, 4, 5, 6, 7, 8}

Last edge is (5,7) again both are belongs to same set and forming the cycle. Same approach is used by kruskal’s algorithm to find the minimum spanning tree.

## Application of Disjoint Set data structure

One of the main application of disjoint set data structure is to find the connected components of an undirected graph. The connected_component follows disjoint set operation to compute the connected component of the graph. The algorithm for connected_component is given below.

`Connected_component(G)for each vertex v ∈ G.V    Make_set(v)for each edge (u,v) ∈ G.E    if Find_set(u) ≠ Find_set(v)       Union(u,v)Same_component(u,v)if Find_set(u) == Find_set(v)    return TRUEelse    return FALSE`

The procedure Connected_component initially place each vertex v in its own set, then for each edge (u,v) it unites the sets containing u and v. Same_component determined whether two vertices are in the same connected component.

The second application is to find the MST using kruskal’s algorithm. The algorithm starts from the forest and ends with a tree.

`MST(G,W)A = ∅for each vertex v ∈ G.V    Make_set(v)sort the edges of G.E into increasing order by weight wfor each edge(u,v) ∈ G.E taken in non deacresing order by weight w    if Find_set(u) ≠ Find_set(v)        A = A ∪{(u,v)}        union(u,v)return A`

Time complexity of kruskal’s algorithm is O(E log V).

We can analyse the running time of the disjoint set data structure in terms of two parameters that is n, the number of Make_set operation and m, the total number of Make_set, Union and Find_set operation. As we know the sets are disjoint set, each Union operation reduces the number of set by 1. After n-1 union operation, only one set remains. Thus the maximum number of union operation possible is n-1.