Our April 2021 cohorts are filling up quickly. Join our free webinar to Uplevel your career

Find the longest common subsequence of two strings.

A subsequence is a sequence that can be derived from another sequence by deleting zero or more elements without changing the order of the remaining elements.

Input: “ABCDE”, “AECBD”

Output: “ABD”

Subsequence “ABD” can be derived from the first string by deleting characters “C” and “E”. From the second string it can be derived by deleting “E” and “C”. No common subsequence longer than three characters exists in the two given strings. “ACD” is another common subsequence of length three; it is also a correct answer.

Input Parameters: The function has two string parameters.

Output: Return a string, the longest common subsequence. If a nonempty common subsequence cannot be found, return “-1”.

● 1

● Input strings consist of the alphanumeric characters.

We provided three solutions.

Start with index 0 of both strings, let index of String A at any point be ‘i’ and that of B be ‘j’ (initially i=j=0)

The index of the String implies the character that we are considering for including in the subsequence.

lcs_util(A, B, i, j):

- if the character at i in String A = character in String B at j, then this character can be part of the subsequence. Increment both i, j.

- else,

1. either the LCS includes the character at i in String A.

2. or, the LCS includes the character in String B at j.

3. or, the LCS excluding both the characters at i and j of String A and String B respectively.

As we do not know beforehand whether to exclude i or j, so proceed by doing both but at each step keep track of the longest substring returned.

1. the LCS includes the character at i in String A: we increment j and check for character match in the two Strings, at i and j (i.e. Move back to lcs_util(a, b, i, j+1))

2. the LCS includes the character in String B at j: we increment i and check for character match in the two Strings, at i and j (i.e. Move back to lcs_util(a, b, i+1, j))

3. the LCS excluding both the characters at i and j of String A and String B respectively: this is implicitly covered in 1 & 2 steps.

Let’s say String A: “ab” and String B: “aeb”

lcs_util(A, B, i, j) -> this method checks for the LCS of A, starting from ‘i’ to the end & B, starting from ‘j’ to the end.

lcs_util("ab", "aeb", 0, 0):

i=0, j=0: as character matches so LCS includes "a"

i=1, j=1: then check lcs_util("ab", "aeb", 1, 2) and lcs_util("ab", "aeb", 2, 1)

1. lcs_util("ab", "aeb", 1, 2) -> "ab"

2. lcs_util("ab", "aeb", 2, 1) -> "a"

Max of the two is the answer which is "ab"

O(2^(len_a+len_b)) where len_a and len_b are lengths of string A and string B respectively.

As for each character of the string, we are taking two possibilities whether it will be part of LCS or not. Hence total possibilities will be 2^(len_a+len_b) and hence time complexity will be O(2^(len_a+len_b)).

O(2^(len_a+len_b) * min(len_a, len_b)) where len_a and len_b are lengths of string A and string B respectively.

As we are calling our function to consider all possible LCS strings, it will take O(2^(len_a+len_b)) call stack space and each call will store O(min(len_a, len_b)) of extra space to store intermediate generated strings.

O(2^(len_a+len_b) * min(len_a, len_b)) where len_a and len_b are lengths of string A and string B respectively.

As to store two input strings of length len_a and len_b, it will take O(len_a) + O(len_b). Auxiliary space used is O(2^(len_a+len_b) * min(len_a, len_b)) and for the output, string can take O(min(len_a, len_b)) of space. Hence total complexity will be

O(2^(len_a+len_b) * min(len_a, len_b)) + O(len_a) + O(len_b) + O(min(len_a, len_b)) = O(2^(len_a+len_b) * min(len_a, len_b)).

In the brute_force_solution, we can observe that it is repeatedly calculating already calculated subproblems. We can avoid it by using memoization.

For example in LCS(A, B, 0, 0), LCS(A, B, 1, 1) is called twice.

The whole thing can be optimized by storing the calculation the first time it is calculated and reusing that value later.

We are maintaining dp states, where dp[i][j] means the LCS for String A starting from 'i' to the end and String B, starting from 'j' to the end.

O(len_a * len_b) where len_a and len_b are lengths of string A and string B respectively.

As we are maintaining dp states, where for dp[i][j]; 0

O(len_a * len_b * min(len_a, len_b)) where len_a and len_b are lengths of string A and string B respectively.

As we are calling our lcs_util function to fill our maintained two-dimensional array dp, it will take O(len_a * len_b) call stack space and each call will store O(min(len_a, len_b)) add extra space to store intermediate generated strings.

O(len_a * len_b * min(len_a, len_b)) where len_a and len_b are lengths of string A and string B respectively.

To store two input strings of length len_a and len_b, it will take O(len_a) + O(len_b). Auxiliary space used is O(len_a * len_b * min(len_a, len_b)) and for the output, string can take O(min(len_a, len_b)) of space. Hence total complexity will be

O(len_a * len_b * min(len_a, len_b)) + O(len_a) + O(len_b) + O(min(len_a, len_b)) = O(len_a * len_b * min(len_a, len_b)).

In this approach, we divide our problem of finding the longest common subsequence into two subproblems.

1. To calculate the length of the longest common subsequence: For this, we will maintain states in two-dimensional array L which will also help in solving the second subpart.

2. Create subsequence from maintained states while calculating the length of LCS.

Dp state in two-dimensional array L where L[i][j] contains length of LCS of a[0..i-1] and b[0..j-1].

We will fill our maintained two-dimensional array L (where 0

1. If i == 0 or j == 0 then L[i][j] = 0.

2. If a[i-1] == b[j-1] then L[i][j] = L[i-1][j-1] + 1.

3. Else L[i][j] = max(L[i-1][j], L[i][j-1]).

Now we know the length of LCS as L[length of a - 1][length of b - 1] and we have maintained dp state to create required subsequence out of it.

We create a character array lcs of length L[length of a - 1][length of b - 1].

We will start from right-most-bottom-most corner i.e. i = length of a - 1 and j = length of b - 1.

If the character at index i of string a and character at index j of string b are same then that character will be part of our LCS and we fill our character array lcs reduce values of both i and j.

Else we will compare L[i-1][j] and L[i][j-1].

1. If L[i-1][j] > L[i][j-1], reduce i.

2. Else reduce j.

We will do this untill i>0 and j>0.

And by now we got our final longest common subsequence characters stored in character array lcs.

O(len_a * len_b) where len_a and len_b are lengths of string A and string B respectively.

We are filling maintained two-dimensional array L of size len_a * len_b, it will take O(len_a * len_b) also we are filling character array lcs of size min(len_a, len_b) in worst case, so it will take O(min(len_a, len_b)).

Hence total complexity will be O(len_a * len_b) + O(min(len_a, len_b)) = O(len_a * len_b).

O(len_a * len_b) where len_a and len_b are lengths of string A and string B respectively.

As we are maintaining two-dimensional array L of size len_a * len_b, it will consume space of O(len_a * len_b) and also we are maintaining character array of size min(len_a, len_b) in worst case, it will take O(min(len_a, len_b)). Hence, total auxiliary space used will be O(len_a * len_b) + O(min(len_a, len_b)) = O(len_a * len_b).

O(len_a * len_b) where len_a and len_b are lengths of string A and string B respectively.

As to store two input strings of length len_a and len_b, it will take O(len_a) + O(len_b). Auxiliary space used is O(len_a * len_b) and for the output, string can take O(min(len_a, len_b)) of space. Hence total complexity will be

O(len_a * len_b) + O(len_a) + O(len_b) + O(min(len_a, len_b)) = O(len_a * len_b).