About usWhy usInstructorsReviewsCostFAQContactBlogRegister for Webinar

Longest Substring Without Repeating Characters Problem

Given a string, find the length of its longest substring with no repeating characters.

Example One

Input: abcdab

Output: 4

The given string has three substrings of length 4 with no repeating characters. They are the substrings: “abcd”, “bcda” and “cdab”. It also does not have any substring of a greater length consisting of unique characters.

Example Two

Input: aaaaa

Output: 1

Notes:
Constraints:

• 1

• The string may contain the following ASCII characters only: a..z, A..Z, 0..9

Solutions

We've provided three solutions. 

We will first start with a brute force idea and will then try improving upon it using some general observations. Throughout this editorial, we are going to use str to name the input string and n for its length.

For all the solutions, we will need a container to track whether a given character has occurred in the substring or not. We will need this to make sure that no character is repeated. For this, we can use a boolean array to store the occurrences of each character in the substring (other alternatives could be unordered_map/unordered_set) as it will facilitate the O(1) look-up and update for each character.

1) check_all_substrings_solution.cpp

We can one-by-one consider all the substrings of the given string. Let us keep track of the length of the longest substring with no repetitions and call it max_len. 

If the length of the input string is n, then there will be n - start number of substrings starting from each index start.

Then, for every substring starting at an index start and ending at an index end (start

For example, suppose we have been given a string str = “abcdab”. Then, we will try generating all of its substrings “a”, “ab”, abc”, “abcd”, “abcda” and so on. Then, for each substring we will check if all the characters are unique in the current substring and if they are, we will check and update the value of max_len.

Time Complexity: 

O(n^3).

As you might have noted, there are O(n^2) numbers of substrings for any given string of length n and traversing through each substring and checking for the unique characters using a boolean array takes O(n) time. Hence, the overall time complexity of this approach is O(n^3).

Auxiliary Space Used:

O(1).

Although we used a boolean array to store the occurrences of characters in the string, the size of that array is restricted to 256 irrespective of the size of the input string.

Space Complexity:

O(n).

Space used for input: O(n).

Auxiliary space used: O(1).

Space used for output: O(1). 

So, total space complexity: O(n).


int length_of_longest_substring(string str) {
    int n = str.length();
    int max_len = 0;
    vector < bool > character_set(256);
    bool valid_substring;

    for (int start = 0; start < n; ++start) {
        for (int end = start; end < n; ++end) {
            // Checking the substring from index start to index end

            // Keeps track of unique characters in the current substring
            // clearing the character_set for the current substring.
            fill(character_set.begin(), character_set.end(), false);
            valid_substring = true;
            for (int ind = start; ind <= end; ++ind) {
                // If the current character was found earlier in the same substring,
                // then the current substring is not valid.
                if (character_set[(int) str[ind]]) {
                    valid_substring = false;
                    break;
                }
                character_set[(int) str[ind]] = true;
            }
            
            if (valid_substring) {
                max_len = max(max_len, end - start + 1);
            }
        }
    }
    return max_len;
}


2) limit_the_max_length_solution.cpp

We can improve this O(n^3) solution to O(n) by observing the fact that the answer cannot be greater than 256. Reason being, the number of unique characters in any substring cannot be greater than 256. The end index of any substring will now be less than or equal to min (n - 1, start + 256 - 1). So, starting from any index start, we will just have to iterate 256 characters in the worst case. Note that we are not using the index end here. Thus, from any index start, we will do 256 iterations at most. Here we can again use a boolean array to store the occurrences of each character while iterating through any substring.

Time Complexity:

O(n).

Starting at any index start, we will iterate through 256 characters in the worst case. Therefore, the time complexity of this approach is O(n*256) = O(n).

Auxiliary Space Used:

O(1).

Space used by the Boolean array: O(1).

Space Complexity:

O(n).

Space used for input: O(n).

Auxiliary space used: O(1).

Space used for output: O(1).

So, total space complexity: O(n).


int length_of_longest_substring(string str) {
    int n = str.length();
    int max_len = 0;
    vector < bool > character_set(256);

    for (int start = 0; start < n; ++start) {
        // Keeps track of unique characters in the current substring.
        // Clearing the character_set for the current substring.
        fill(character_set.begin(), character_set.end(), false);
        int end = start;

        // This inner loop will not get executed more than 256 times.
        while (end < n) {
            // If the current character has already been seen in the current substring, we break out.
            if (character_set[(int) str[end]]) {
                break;
            }
            character_set[(int) str[end]] = true; // Including the current character in the character_set.
            max_len = max(max_len, end - start + 1); // Updating the result if possible.
            end++;
        }
    }
    return max_len;
}


3) sliding_window_solution.cpp

We can also use the sliding window approach to solve this problem in O(n) time. The previous solution had a constant factor of around 256 in the time complexity. We can reduce this constant factor considerably using the sliding window technique.  

We maintain two endpoints start and end of the substring under consideration (start = end = 0 initially). 

We move the index end to its right while there is no repetition (we will check for the repetition through the boolean array that we have been maintaining similar to the first two solutions).

Once there is a repetition, we will move the start index until that repetition is removed. Basically, here, str[end] will point to a character that is found elsewhere in the current substring. Therefore, to move this repetition out of our substring, we will squeeze the substring from the left until we remove the first occurrence of str[end] from our substring. Reason being, without removing this repetition, we cannot move ahead as we consider only the substring with all unique characters.

Let us now see a simulation of this idea for str = “abcdab”.

For the sake of simplicity, we will not show the complete boolean array of size 256. Let us call that array character_set. Here, we will only denote the characters for which the character_set holds a true value.

Step 0: Initially, we have start = 0 and end = 0. Also, the character set is initially empty as it has all the values as false. Hence, character_set = {}. Also, the result max_len = 0 initially. 

Step 1: We check that str[end] is not present in the character set, hence we include it in the character set and widen up our window from the right. So, we increment end by 1. 

Currently, we have:

[ab]cdab

character_set = {‘a’}

max_len = 1

Here, we represent the sliding window with a set of matching brackets. The starting bracket is placed before the index start and ending bracket is placed after index end.

Step 2: 

[abc]dab

character_set = {‘a’, ‘b’}

max_len = 2

Again, str[end] is not present in the character_set, hence we include it in the character_set and widen up our window from the right.

Step 3:

[abcd]ab

character_set = {‘a’, ‘b’, ‘c’}

max_len = 3

Again, str[end] is not present in the character_set, hence we include it in the character_set and widen up our window from the right.

Step 4:

[abcda]b 

character_set = {‘a’, ‘b’, ‘c’, ‘d’}

max_len = 4

Again, str[end] is not present in the character_set, hence we include it in the character_set and widen up our window from the right.

Step 5:

a[bcdab] 

character_set = {‘a’, ‘b’, ‘c’, ‘d’}

max_len = 4

Here, str[end] = ‘a’ was already present in the character_set. So, we squeeze our window from the left and its first occurrence is removed from the string. After that, we again widen up our window from the right to check for the next character.

Step 6:

ab[cdab] 

character_set = {‘a’, ‘b’, ‘c’, ‘d’}

max_len = 4

Here, str[end] = ‘b’ was already present in the character_set. So, we squeeze our window from the left and its first occurrence is removed from the string. 

Now, end = 6 and thus, we have considered the complete string as we are no longer left with a different ending index of the substring. Therefore, we stop and report the final result as max_len = 4. 

Time Complexity:

O(n).

As we will iterate through any character at most twice. Once, via start and once via end.

Auxiliary Space Used:

O(1).

As we used only some additional integer variables.

Space Complexity:

O(n).

Space used for input: O(n).

Auxiliary space used: O(1).

Space used for output: O(1).

So, total space complexity: O(n). 


int length_of_longest_substring(string str) {
    int n = str.length();
    // Keeps track of characters inside the current substring.
    vector < bool > character_set(256, false);
    // Starting and ending index of the substring under consideration.
    int start = 0, end = 0;
    int max_len = 0;

    while (start < n and end < n) {
        // If the current character isn't present in the current substring
        // then, include it.
        if (character_set[(int) str[end]] == false) {
            character_set[(int) str[end]] = true;
            end++;

            max_len = max(max_len, end - start);
        }
        // Else, we remove the characters from the start of the substring
        // until we remove the current repeating character.
        else {
            while (start < n and str[start] != str[end]) {
                character_set[(int) str[start]] = false;
                start++;
            }
            // Finally, start will point to the first occurrence of repeated character
            // and end will point to its second occurrence.
            // Hence, we update both.
            start++;
            end++;
        }
    }
    return max_len;
}


Attend our Free Webinar on How to Nail Your Next Technical Interview

Recommended Posts

All Posts