Small-To-Large Merging
Authors: Michael Cao, Benjamin Qi
Contributor: Neo Wang
A way to merge two sets efficiently.
Merging Data Structures
Obviously linked lists can be merged in time. But what about sets or vectors?
Focus Problem – try your best to solve this problem before continuing!
View Internal SolutionLet's consider a tree rooted at node , where each node has a color.
For each node, let's store a set containing only that node, and we want to merge the sets in the nodes subtree together such that each node has a set consisting of all colors in the nodes subtree. Doing this allows us to solve a variety of problems, such as query the number of distinct colors in each subtree.
Naive Solution
Suppose that we want merge two sets and of sizes and , respectively. One possiblility is the following:
for (int x : b) a.insert(x);
which runs in time, yielding a runtime of in the worst case. If we instead maintain and as sorted vectors, we can merge them in time, but is also too slow.
Better Solution
With just one additional line of code, we can significantly speed this up.
if (a.size() < b.size()) swap(a, b);for (int x : b) a.insert(x);
Note that swap exchanges two sets in time. Thus, merging a smaller set of size into the larger one of size takes time.
Claim: The solution runs in time.
Proof: When merging two sets, you move from the smaller set to the larger set. If the size of the smaller set is , then the size of the resulting set is at least . Thus, an element that has been moved times will be in a set of size at least , and since the maximum size of a set is (the root), each element will be moved at most ) times.
Full Code
#include <bits/stdc++.h>using namespace std;const int MAX_N = 2e5;// nodes will be 1-indexed like in the problemvector<int> adj[MAX_N + 1];set<int> colors[MAX_N + 1];
Generalizing
We can also merge other standard library data structures such as std::map
or
std:unordered_map
in the same way. However,
std::swap
does not always
run in time. For example, swapping
std::array
s takes time
linear in the sum of the sizes of the arrays, and the same goes for
GCC policy-based data structures such
as __gnu_pbds::tree
or __gnu_pbds::gp_hash_table
.
To swap two policy-based data structures a
and b
in time,
use a.swap(b)
instead. Note that for standard library data structures,
swap(a,b)
is equivalent to a.swap(b)
.
Problems
Status | Source | Problem Name | Difficulty | Tags | |
---|---|---|---|---|---|
CF | Normal | Show TagsMerging | |||
Plat | Normal | Show TagsIndexed Set, Merging | |||
Plat | Normal | Show TagsMerging | |||
POI | Normal | Show TagsIndexed Set, Merging | |||
IOI | Normal | Show TagsCentroid, Merging | |||
JOI | Hard | Show TagsMerging | |||
COI | Hard | Show TagsMerging | |||
JOI | Very Hard | Show TagsMerging, SCC |
Optional: Faster Merging
It's easy to merge two sets of sizes in or time, but sometimes can be significantly better than both of these. Check "Advanced - Treaps" for more details. Also see this link regarding merging segment trees.
Module Progress:
Join the USACO Forum!
Stuck on a problem, or don't understand a module? Join the USACO Forum and get help from other competitive programmers!