Find Duplicate Subtrees in a Binary Tree in Python

 

 

This program identifies duplicate subtrees in a binary tree using hashing. It utilizes a dictionary to keep track of serialized subtrees and their occurrences.

Program Structure

  • TreeNode Class: Defines the structure of a node in the binary tree.
  • serialize Function: Serializes the subtree rooted at a given node into a string format.
  • findDuplicateSubtrees Function: Traverses the binary tree and identifies duplicate subtrees.
  • Example Usage: Demonstrates how to build a binary tree and call the function to find duplicates.

Python Code


class TreeNode:
    def __init__(self, val=0, left=None, right=None):
        self.val = val
        self.left = left
        self.right = right

def serialize(node, lookup, duplicates):
    if not node:
        return "#"
    
    # Serialize the subtree
    serialized = "{},{},{}".format(node.val, serialize(node.left, lookup, duplicates), serialize(node.right, lookup, duplicates))
    
    # Use the serialized subtree to track occurrences
    if serialized in lookup:
        lookup[serialized] += 1
    else:
        lookup[serialized] = 1
    
    # If this subtree is seen twice, add it to duplicates
    if lookup[serialized] == 2:
        duplicates.append(node)
    
    return serialized

def findDuplicateSubtrees(root):
    lookup = {}
    duplicates = []
    serialize(root, lookup, duplicates)
    return duplicates

# Example Usage
if __name__ == "__main__":
    # Construct the binary tree
    # Example Tree:
    #         1
    #        / \
    #       2   3
    #      / \   \
    #     4   5   2
    #            / \
    #           4   5

    root = TreeNode(1)
    root.left = TreeNode(2)
    root.right = TreeNode(3)
    root.left.left = TreeNode(4)
    root.left.right = TreeNode(5)
    root.right.right = TreeNode(2)
    root.right.right.left = TreeNode(4)
    root.right.right.right = TreeNode(5)

    duplicates = findDuplicateSubtrees(root)
    for dup in duplicates:
        print("Duplicate subtree rooted at node with value:", dup.val)

Explanation

The program uses a depth-first traversal approach to serialize each subtree and track its occurrences using a dictionary:

  1. The TreeNode class represents each node in the binary tree.
  2. The serialize function recursively constructs a string representation of the subtree.
  3. The lookup dictionary records how many times each serialized subtree appears.
  4. If a subtree is found twice, it is added to the duplicates list.
  5. The findDuplicateSubtrees function initializes the process and returns the list of duplicate subtrees.

In the example usage, a binary tree is constructed, and the program identifies and prints the root values of any duplicate subtrees.

 

Explanation of the Code:

  1. TreeNode Class: Defines a simple structure for a binary tree node, with properties for the node value and pointers to left and right children.
  2. serialize Function:
    • Recursively serializes each subtree rooted at the current node.
    • Uses a hash table (lookup) to count occurrences of each serialized subtree.
    • Adds a subtree to the duplicates list if it is encountered for the second time.
  3. findDuplicateSubtrees Function:
    • Initializes data structures and starts the serialization process from the root of the binary tree.
  4. Example Usage:
    • Constructs a sample binary tree.
    • Calls the findDuplicateSubtrees function and prints out the values of the roots of duplicate subtrees.

Leave a Reply

Your email address will not be published. Required fields are marked *