This post presents a C++ program to identify duplicate subtrees in a binary tree. The program employs hashing to store the serialized representation of each subtree and uses a hash map to track occurrences.
Program Explanation
The program defines a binary tree structure and provides a mechanism to find duplicate subtrees. Here’s a breakdown of the components:
1. Structure of the Node
struct TreeNode {
int val;
TreeNode *left;
TreeNode *right;
TreeNode(int x) : val(x), left(NULL), right(NULL) {}
};
The TreeNode
structure represents each node in the binary tree, containing an integer value and pointers to its left and right children.
2. Hash Map for Subtree Serialization
#include <unordered_map>
#include <vector>
#include <string>
unordered_map<string, int> subtreeCount;
The subtreeCount
hash map stores the serialized representation of subtrees as keys and their occurrence counts as values. This allows us to track duplicates efficiently.
3. Serializing the Subtree
string serialize(TreeNode* root) {
if (!root) return "#"; // Use '#' to represent NULL
string left = serialize(root->left);
string right = serialize(root->right);
string serialized = to_string(root->val) + "," + left + "," + right;
subtreeCount[serialized]++;
return serialized;
}
The serialize
function converts a subtree rooted at root
into a string format. It recursively serializes the left and right children, concatenating their representations with the current node’s value. The function also updates the subtreeCount
hash map.
4. Finding Duplicate Subtrees
vector<TreeNode*> findDuplicateSubtrees(TreeNode* root) {
vector<TreeNode*> duplicates;
serialize(root); // Populate the subtreeCount
for (const auto& entry : subtreeCount) {
if (entry.second > 1) {
// Add the subtree to duplicates (not implemented here)
}
}
return duplicates;
}
The findDuplicateSubtrees
function first serializes the tree to populate the subtreeCount
map. Then, it iterates through the map to identify subtrees that appear more than once. Note that the logic for actually adding the duplicate subtrees to the duplicates
vector needs to be implemented for a complete solution.
Complete C++ Program
#include <iostream>
#include <unordered_map>
#include <vector>
#include <string>
using namespace std;
// Structure for a binary tree node
struct TreeNode {
int val;
TreeNode *left;
TreeNode *right;
TreeNode(int x) : val(x), left(NULL), right(NULL) {}
};
// Hash map to count subtree occurrences
unordered_map<string, int> subtreeCount;
// Function to serialize the subtree and populate the hash map
string serialize(TreeNode* root) {
if (!root) return "#"; // Use '#' to represent NULL
string left = serialize(root->left);
string right = serialize(root->right);
string serialized = to_string(root->val) + "," + left + "," + right;
subtreeCount[serialized]++;
return serialized;
}
// Function to find duplicate subtrees
vector<TreeNode*> findDuplicateSubtrees(TreeNode* root) {
vector<TreeNode*> duplicates;
serialize(root); // Populate the subtreeCount
// Find all duplicate subtrees
for (const auto& entry : subtreeCount) {
if (entry.second > 1) {
// Add the subtree root node to duplicates (not implemented)
}
}
return duplicates;
}
// Example usage
int main() {
// Create a sample binary tree
TreeNode* root = new TreeNode(1);
root->left = new TreeNode(2);
root->right = new TreeNode(3);
root->left->left = new TreeNode(4);
root->left->right = new TreeNode(2);
root->left->right->left = new TreeNode(4);
root->right->right = new TreeNode(4);
// Find duplicate subtrees
vector<TreeNode*> duplicates = findDuplicateSubtrees(root);
// Print duplicate subtree roots (dummy print, to be implemented)
cout << "Duplicate subtrees found: " << duplicates.size() << endl;
return 0;
}
Conclusion
This program demonstrates how to find duplicate subtrees in a binary tree using hashing for efficient storage and retrieval of subtree representations. By serializing each subtree and counting occurrences with a hash map, we can quickly identify duplicates.