Problem Statement
The Minimum Edit Distance problem, also known as the Levenshtein distance, is a measure of how dissimilar two strings are by counting the minimum number of operations required to transform one string into the other. The allowed operations are insertion, deletion, and substitution of a single character.
Approach Explanation
This problem can be efficiently solved using dynamic programming. The idea is to build a table that represents the edit distance between substrings of the two input strings.
Dynamic Programming Structure:
- Define a 2D DP table where
dp[i][j]
represents the minimum edit distance between the firsti
characters of stringA
and the firstj
characters of stringB
. - Initialize the first row and first column of the table based on the operations required to convert an empty string to the respective prefixes.
- Iterate through the table, calculating the minimum edit distance by considering the costs of insertion, deletion, and substitution.
- The final answer will be found in
dp[m][n]
, wherem
andn
are the lengths of the two strings.
Time Complexity:
O(m * n), where m
and n
are the lengths of the two strings.
Space Complexity:
O(m * n) for the DP table.
Python Code
def min_edit_distance(A, B):
"""
Function to calculate the minimum edit distance between two strings.
Args:
A (str): First input string.
B (str): Second input string.
Returns:
int: Minimum edit distance between the two strings.
"""
m, n = len(A), len(B)
# Create a DP table initialized to 0
dp = [[0] * (n + 1) for _ in range(m + 1)]
# Initialize the first column and first row
for i in range(m + 1):
dp[i][0] = i # Cost of deleting all characters from A
for j in range(n + 1):
dp[0][j] = j # Cost of inserting all characters to A to match B
# Fill the DP table
for i in range(1, m + 1):
for j in range(1, n + 1):
if A[i - 1] == B[j - 1]:
dp[i][j] = dp[i - 1][j - 1] # No operation needed
else:
dp[i][j] = min(dp[i - 1][j] + 1, # Deletion
dp[i][j - 1] + 1, # Insertion
dp[i - 1][j - 1] + 1) # Substitution
return dp[m][n]
# Example usage
string1 = "kitten"
string2 = "sitting"
distance = min_edit_distance(string1, string2)
print("Minimum edit distance is:", distance)
Explanation of the Program
Let’s break down the structure of the program:
1. Input:
The input consists of two strings for which we want to calculate the minimum edit distance. For example:
string1 = "kitten" string2 = "sitting"
2. DP Table Initialization:
A DP table is created with dimensions (m + 1) x (n + 1)
, initialized to zeros. This table is used to store the minimum edit distances for different pairs of prefixes of the two strings.
3. Base Case Setup:
The first row and the first column of the table are initialized. The first column represents the cost of converting the first string to an empty string (i.e., deleting all characters), while the first row represents the cost of converting an empty string to the second string (i.e., inserting all characters).
4. Filling the DP Table:
The program iterates through each character of both strings. If the characters at the current indices match, the value in the DP table is carried over from the diagonal (no operation needed). If they do not match, the program calculates the minimum cost considering insertion, deletion, and substitution.
5. Final Result:
The minimum edit distance can be found in the bottom-right cell of the DP table: dp[m][n]
.
Example Execution:
For the provided input strings, the output will display the minimum edit distance:
Minimum edit distance is: 3
This indicates that the minimum number of operations required to transform “kitten” into “sitting” is 3.