Find Duplicate Subtrees in a Binary Tree Using Hashing in C++

 

 

This post presents a C++ program to identify duplicate subtrees in a binary tree. The program employs hashing to store the serialized representation of each subtree and uses a hash map to track occurrences.

Program Explanation

The program defines a binary tree structure and provides a mechanism to find duplicate subtrees. Here’s a breakdown of the components:

1. Structure of the Node

struct TreeNode {
    int val;
    TreeNode *left;
    TreeNode *right;
    TreeNode(int x) : val(x), left(NULL), right(NULL) {}
};

The TreeNode structure represents each node in the binary tree, containing an integer value and pointers to its left and right children.

2. Hash Map for Subtree Serialization

#include <unordered_map>
#include <vector>
#include <string>

unordered_map<string, int> subtreeCount;

The subtreeCount hash map stores the serialized representation of subtrees as keys and their occurrence counts as values. This allows us to track duplicates efficiently.

3. Serializing the Subtree

string serialize(TreeNode* root) {
    if (!root) return "#"; // Use '#' to represent NULL
    string left = serialize(root->left);
    string right = serialize(root->right);
    string serialized = to_string(root->val) + "," + left + "," + right;
    subtreeCount[serialized]++;
    return serialized;
}

The serialize function converts a subtree rooted at root into a string format. It recursively serializes the left and right children, concatenating their representations with the current node’s value. The function also updates the subtreeCount hash map.

4. Finding Duplicate Subtrees

vector<TreeNode*> findDuplicateSubtrees(TreeNode* root) {
    vector<TreeNode*> duplicates;
    serialize(root); // Populate the subtreeCount
    for (const auto& entry : subtreeCount) {
        if (entry.second > 1) {
            // Add the subtree to duplicates (not implemented here)
        }
    }
    return duplicates;
}

The findDuplicateSubtrees function first serializes the tree to populate the subtreeCount map. Then, it iterates through the map to identify subtrees that appear more than once. Note that the logic for actually adding the duplicate subtrees to the duplicates vector needs to be implemented for a complete solution.

Complete C++ Program

#include <iostream>
#include <unordered_map>
#include <vector>
#include <string>

using namespace std;

// Structure for a binary tree node
struct TreeNode {
    int val;
    TreeNode *left;
    TreeNode *right;
    TreeNode(int x) : val(x), left(NULL), right(NULL) {}
};

// Hash map to count subtree occurrences
unordered_map<string, int> subtreeCount;

// Function to serialize the subtree and populate the hash map
string serialize(TreeNode* root) {
    if (!root) return "#"; // Use '#' to represent NULL
    string left = serialize(root->left);
    string right = serialize(root->right);
    string serialized = to_string(root->val) + "," + left + "," + right;
    subtreeCount[serialized]++;
    return serialized;
}

// Function to find duplicate subtrees
vector<TreeNode*> findDuplicateSubtrees(TreeNode* root) {
    vector<TreeNode*> duplicates;
    serialize(root); // Populate the subtreeCount

    // Find all duplicate subtrees
    for (const auto& entry : subtreeCount) {
        if (entry.second > 1) {
            // Add the subtree root node to duplicates (not implemented)
        }
    }
    return duplicates;
}

// Example usage
int main() {
    // Create a sample binary tree
    TreeNode* root = new TreeNode(1);
    root->left = new TreeNode(2);
    root->right = new TreeNode(3);
    root->left->left = new TreeNode(4);
    root->left->right = new TreeNode(2);
    root->left->right->left = new TreeNode(4);
    root->right->right = new TreeNode(4);

    // Find duplicate subtrees
    vector<TreeNode*> duplicates = findDuplicateSubtrees(root);

    // Print duplicate subtree roots (dummy print, to be implemented)
    cout << "Duplicate subtrees found: " << duplicates.size() << endl;

    return 0;
}

Conclusion

This program demonstrates how to find duplicate subtrees in a binary tree using hashing for efficient storage and retrieval of subtree representations. By serializing each subtree and counting occurrences with a hash map, we can quickly identify duplicates.

 

Leave a Reply

Your email address will not be published. Required fields are marked *