Union-Find LeetCode Pattern Guide — Disjoint Sets, Connectivity, and Kruskal's
A practical Union-Find LeetCode pattern guide for disjoint sets, connectivity checks, cycle detection, component counting, and Kruskal's minimum spanning tree problems.
Union-Find LeetCode Pattern Guide — Disjoint Sets, Connectivity, and Kruskal's
This Union-Find LeetCode pattern guide is for problems where the question is not “what is the shortest path?” but “which things belong to the same group?” Disjoint sets, connectivity, and Kruskal's algorithm all use the same core data structure: maintain components while merging pairs and answering whether two nodes share a representative. Once you recognize that a graph problem only needs component membership, Union-Find is usually simpler and faster than repeatedly running DFS or BFS.
Interviewers like Union-Find because the implementation is compact, but the modeling step is subtle. You must decide what the nodes are, what an edge means, and what each successful union changes.
What Union-Find stores
Union-Find, also called Disjoint Set Union or DSU, stores a partition of items into non-overlapping groups. It supports two operations:
find(x): return the representative, or root, of the set containingx.union(a, b): merge the sets containingaandb; return whether a merge actually happened.
With path compression and union by size or rank, both operations are effectively constant time for interview constraints.
A reliable Python template:
class DSU:
def __init__(self, n):
self.parent = list(range(n))
self.size = [1] * n
self.components = n
def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x])
return self.parent[x]
def union(self, a, b):
ra, rb = self.find(a), self.find(b)
if ra == rb:
return False
if self.size[ra] < self.size[rb]:
ra, rb = rb, ra
self.parent[rb] = ra
self.size[ra] += self.size[rb]
self.components -= 1
return True
The union return value is extremely useful. False often means “this edge creates a cycle,” “these accounts were already connected,” or “this equation is already implied.”
When to reach for Union-Find
Use DSU when the problem has one of these shapes:
| Problem signal | DSU interpretation | |---|---| | “Are these nodes connected?” | Compare roots after unions | | “How many connected components?” | Start with n, decrement on successful union | | “Does adding this edge create a cycle?” | Cycle if both endpoints already share a root | | “Merge accounts / emails / synonyms” | Map each identifier to a node and union related identifiers | | “Regions cut by slashes” | Model subcells as nodes and union open edges | | “Minimum spanning tree” | Sort edges and union only cross-component edges | | “Satisfiability of equality equations” | Union equals, then reject unequal pairs with same root |
Do not use DSU when you need an actual path, shortest distance, topological order, or dynamic deletions. Union-Find is excellent at merging; it is weak at un-merging.
Connectivity and component counting
For a basic “number of connected components” problem, initialize each node as its own component. Each successful union reduces the component count by one.
def count_components(n, edges):
dsu = DSU(n)
for a, b in edges:
dsu.union(a, b)
return dsu.components
The invariant is simple: components equals the number of distinct roots. You do not have to run a final DFS. You also do not have to count unique parents directly unless you have compressed all paths, because some parent entries may point to intermediate nodes.
For “valid tree,” use two facts: a tree with n nodes must have exactly n - 1 edges, and no edge can connect nodes already in the same component.
def valid_tree(n, edges):
if len(edges) != n - 1:
return False
dsu = DSU(n)
for a, b in edges:
if not dsu.union(a, b):
return False
return True
This avoids overcomplicating the problem. The edge count plus no cycle implies connectedness for an undirected graph.
Kruskal's algorithm: greedy plus DSU
Kruskal's minimum spanning tree algorithm is the most famous Union-Find use case. Sort edges by weight. Add the cheapest edge that connects two different components. Skip edges whose endpoints are already connected.
def kruskal(n, edges):
# edges are (weight, a, b)
dsu = DSU(n)
total = 0
used = 0
for w, a, b in sorted(edges):
if dsu.union(a, b):
total += w
used += 1
if used == n - 1:
return total
return None # graph was disconnected
The proof uses the cut property: for any cut of the graph, the cheapest edge crossing that cut is safe to include in some minimum spanning tree. DSU tells you whether an edge crosses between components or stays inside a component already connected by cheaper edges.
In LeetCode-style problems, Kruskal often appears as “connect all points with minimum cost,” “optimize water distribution,” or “minimum cost to connect cities.” The expensive part may be building edges, not the DSU itself.
Modeling non-obvious nodes
Many DSU problems are hard because the nodes are not given as 0..n-1.
Accounts merge
Each email can be a node. For every account, union the first email with each other email in that account. Then group emails by root and attach the account name.
The trap is that names are not unique identifiers. Two people can share a name, while one person can have multiple emails. Model the stable identifier, not the display field.
Equations possible
For equations like a == b and a != b, first union all equalities. Then scan inequalities and reject any pair with the same root. Order matters because an inequality cannot be evaluated safely until all implied equalities are known.
Grid islands
You can map cell (r, c) to r * cols + c. Union adjacent land cells. This is often simpler for dynamic “number of islands as land is added” because each new land cell starts as one component and unions with existing land neighbors.
idx = r * cols + c
Keep a seen set for active land cells so water nodes do not accidentally count as components.
Path compression and union by size explained
Without optimization, find can degrade into chasing a long chain. Path compression flattens that chain by making each visited node point directly to the root. Union by size attaches the smaller tree under the larger tree. Together, they make DSU fast enough for very large inputs.
In an interview, you can say:
“Path compression rewrites parents during find, and union by size prevents tall trees. Amortized time is inverse Ackermann, which is effectively constant for practical input sizes.”
You usually do not need to derive inverse Ackermann. Knowing why the optimizations work is enough.
Common traps
- Counting parents without calling
find. Intermediate parent pointers may be stale. - Forgetting to decrement components only on successful union. Duplicate edges should not reduce the count.
- Using DSU for directed reachability. DSU models undirected connectivity or equivalence, not direction.
- Processing inequalities too early. Union all equalities before checking contradictions.
- Assuming display names are identifiers. For account merge, emails are the reliable nodes.
- Ignoring 1-indexed input. Many graph problems label cities from 1 to
n; convert or size arrays accordingly. - Trying to delete edges. Standard DSU does not support efficient split operations.
Decision rules in interviews
Ask yourself:
- Are relationships symmetric? If
aconnects tob, doesbconnect toa? - Do I only need group membership, component count, or cycle detection?
- Are operations mostly additions, not deletions?
- Can each item be mapped to an integer or dictionary key?
- Does a failed union mean a cycle or redundancy?
If the answer is yes, DSU is probably a strong candidate.
How to explain Union-Find clearly
A concise interview explanation:
“I’ll represent each connected component by a root. find returns the root with path compression. union merges two roots and returns false if they were already connected. That gives us component count and cycle detection in near-constant time per edge.”
For Kruskal:
“I’ll sort edges by cost and take the cheapest edge that connects two different components. Union-Find enforces the no-cycle rule. The greedy choice is safe by the cut property: the cheapest edge crossing a component boundary can be part of an MST.”
Resume language for DSU work
Make the pattern concrete:
- “Modeled entity resolution as disjoint-set union over stable identifiers, reducing repeated graph traversals to near-constant connectivity checks.”
- “Implemented Kruskal-style cost optimization using Union-Find to prevent cycles and track connected components.”
- “Built dynamic component counting for grid updates with path compression and union by size.”
Union-Find is not a general graph hammer. It is a specialized tool for equivalence and connectivity. When you can reduce a problem to merging sets and asking whether two things share a root, it turns complex graph state into a small, reliable abstraction.
Weighted and labeled Union-Find variations
Some advanced interview problems attach information to the relationship between a node and its root. “Evaluate Division” is the classic example: if a / b = 2.0, you need not only connectivity but also the ratio between connected variables. A weighted DSU stores a parent plus a multiplier from each node to its parent. During path compression, it updates both the parent and the accumulated weight. The same idea can represent parity, distance modulo a number, or “same group versus opposite group” constraints.
You do not need to lead with weighted DSU unless the prompt requires relationship values. The decision rule is: if the query asks only whether two things are connected, plain DSU is enough. If the query asks what value relates them, the root must carry extra metadata.
DSU versus DFS tradeoff
For a static graph with one component-count query, DFS is often just as clear. Union-Find shines when edges arrive incrementally, when you need to detect the exact redundant edge, or when many connectivity queries follow a batch of unions. In an interview, it is fine to say, “DFS would work, but DSU gives simpler incremental merging and a direct cycle signal through failed union.” That kind of tradeoff answer sounds more senior than forcing DSU everywhere.
Testing edge cases
Test duplicate edges, self-loops, disconnected nodes, and one-node graphs. A self-loop should usually make union(a, a) return false, which may indicate a cycle. Duplicate edges should not decrement the component count twice. If labels are strings, test two identifiers that appear only in queries and never in union input; you may need to initialize them lazily or return false because they are unknown.
Related guides
- Graph Algorithms LeetCode Pattern Guide — DFS, BFS, Dijkstra, and Union-Find — A practical graph algorithms LeetCode pattern guide covering DFS, BFS, Dijkstra, union-find, topological sort, grid graphs, and interview explanation patterns. Use it to identify the graph shape before choosing the algorithm.
- Backtracking LeetCode Pattern Guide — Permutations, Combinations, and Subsets — A practical backtracking LeetCode pattern guide with templates, decision rules, duplicate-handling tricks, pruning advice, and interview talk tracks for permutations, combinations, subsets, and constraint problems.
- BFS LeetCode Pattern Guide — Shortest-Path, Level-Order, and Multi-Source — A BFS LeetCode pattern guide for recognizing queue-based problems, implementing level-order traversal, solving unweighted shortest paths, using multi-source BFS, and avoiding common visited-state bugs.
- Binary Search LeetCode Pattern Guide — Bounds, Rotated Arrays, and Search-on-Answer — A binary search LeetCode pattern guide for exact search, lower/upper bounds, rotated arrays, and search-on-answer problems. Includes invariants, templates, traps, and interview-ready explanations.
- Bit Manipulation LeetCode Pattern Guide — XOR Tricks, Masks, and Counting Bits — A practical bit manipulation LeetCode pattern guide covering XOR tricks, masks, counting bits, subset enumeration, flags, and the interview pitfalls around signed integers.
