About usWhy usInstructorsReviewsCostFAQContactBlogRegister for Webinar

Regex Matcher Problem

Problem Statement

Given a text string containing characters only from lowercase alphabetic characters and a pattern string containing characters only from lowercase alphabetic characters and two other special characters '.' and '*'. 

Your task is to implement a pattern matching algorithm that returns true if pattern is matched with text otherwise returns false. The matching must be exact, not partial.

Explanation of the special characters:

1) '.' - Matches a single character from lowercase alphabetic characters.

2) '*' - Matches zero or more preceding character. It is guaranteed that '*' will have one preceding character which can be any lowercase alphabetic character or special character '.'. But '*' will never be the preceding character of '*'. (That means "**" will never occur in the pattern string.) 

'.' = "a", "b", "c", ... , "z".

a* = "a","aa","aaa","aaaa",... or empty string("").

ab* = "a", "ab", "abb", "abbb", "abbbb",...

Example One

Input: text = "abbbc" and pattern = "ab*c"

Output: true

Given pattern string can match:

"ac", "abc", "abbc", "abbbc", "abbbbc", ...

Example Two

Input: text = "abcdefg" and pattern = "a.c.*.*gg*"

Output: true

".*" in pattern can match  "", ".", "..", "...", "....", ...

"g*" in pattern can match "", "g", "gg", "ggg", "gggg", ...

Now, consider:

    '.' at position 2 as 'b',

    '.*'  at position {4,5} as "...",

    '.*' at position {6,7} as "" and

    [g*] at position {8,9} as "".

So, "a.c.*gg*" = "abc...g" where we can write "..." as "def". Hence, both matches.

Example Three

Input:

text = "abc" and pattern = ".ab*.."

Output: false

If we take 'b*' as "" then also, length of the pattern will be 4 (".a.."). But, given text's length is only 3. Hence, they can not match.

Notes

Input Parameters: There are two arguments, first one is string denoting text and second one is string denoting pattern.

Output: Return boolean, true if text and pattern matches exactly, otherwise return false.

Constraints:

● 0

● Text string containing characters only from lowercase alphabetic characters.

● Pattern string containing characters only from lowercase alphabetic characters and two other special characters '.' and '*'

● In pattern string, It is guaranteed that '*' will have one preceding character which can be any lowercase alphabetic character or special character '.'. But '*' will never be the preceding character of '*'.

Solutions

We provided two solutions.

In both solutions we are making simplified pattern string from pattern string (explained below with examples). Basically, We are removing duplicate '*' character from pattern string and '*' itself also. We are also maintaining isStarChar boolean array to store whether simplified pattern character was followed by '*' in pattern string or not.

e.g. If pattern = [abc*c*c*dd*] then simplified pattern string = "abcdd" and isStarChar = [false, false, true, false, true].

Here, in this example we have removed [c*c*c*] and keep only one [c]. [c]'s position in the simplified pattern string is 3. So, we have stored true at position 3 of isStarChar and also removed '*' from pattern string. Same for character 'd*'.


1) brute_solution.java

Time Complexity:

O(2^(len_simpl_pat + len_text))

Auxiliary Space Used:

O(len_pat + len_text).

We are using one extra character array and one boolean array to make the simplified pattern string. Also, O(len_pat + len_text) space will be used by recursion stack.

Space Complexity:

O(len_pat + len_text).

As input is O(len_pat + len_text) and auxiliary space used is also O(len_pat + len_text). So, O(len_pat + len_text) + O(len_pat + len_text) -> O(len_pat + len_text).


2) optimal_solution.java

This solution uses Dynamic Programming. In this method, We build a dp[][] 2D boolean array from top to bottom such that dp[i][j] indicates first i character of text string matches to first j character of pattern string or not.

Here, 3 cases will arise,

Case 1:

dp[i][j-1] is true. 

It means first i character of text string matches with first j-1 character of simplified pattern string. So, dp[i][j] will be true if jth character of simplified pattern string is '*', it means isStarChar[j] should be true.

Case 2:

dp[i-1][j-1] is true.

It means first i-1 character of text string matches with first j-1 character of simplified pattern string. So, dp[i][j] will be true if jth character of simplified pattern string matches with ith character of text string. It can only be match if jth character of pattern string is '.' or same as ith character of text string.

Case 3:

dp[i-1][j] is true.

It means first i-1 character of text string matches with first j character of simplified pattern string. So, dp[i][j] will be true if jth character of simplified pattern string is '*' and can be match with ith character of text string. It can only be match if jth character of pattern string is '.' or same as ith character of text string.

Time Complexity: 

O(len_simpl_pat * len_text + (len_simpl_pat + len_text)).

O(len_simpl_pat + len_text) for reading input and simplifying pattern string.

Auxiliary Space Used:

O(len_simpl_pat * len_text + (len_pat + len_text)).

As we are using boolean array of size (len_simpl_pat * len_text) to store dp values, Char array of size len_pat to get simplified pattern string, boolean array of size len_pat to store is that toStarChar or not.

Space Complexity:

O(len_simpl_pat * len_text + (len_pat + len_text)).

As input is O(len_pat + len_text) and auxiliary space used is O(len_simpl_pat * len_text + (len_pat + len_text)). So, O(len_pat + len_text) + O(len_simpl_pat * len_text + (len_pat + len_text)) -> O(len_simpl_pat * len_text + (len_pat + len_text)).

Attend our Free Webinar on How to Nail Your Next Technical Interview

Recommended Posts

All Posts