Given a dictionary of words and a string, find the number of ways the string can be broken down into the dictionary words. Return the answer modulo 10^9 + 7.

Input: Dictionary: [“kick", "start", "kickstart", "is", "awe", "some", "awesome”]. String: “kickstartisawesome”.

Output: 4

Here are all four ways to break down the string into the dictionary words:

1) kick start is awe some

2) kick start is awesome

3) kickstart is awe some

4) kickstart is awesome

4 % 1000000007 = 4 so the correct output is 4.

Input Parameters: Function has two parameters. 1) an array of strings, the dictionary, 2) a string to be broken down in dictionary words.

Output: Return an integer, the number of distinct ways to break down the string into the dictionary words modulo 10^9 + 7.

● 1 <= number of words in the dictionary <= 200000

● 1 <= length of any dictionary word <= 100

● 1 <= length of the string <= 2000

● Dictionary words and the string all consist of lowercase latin characters (no whitespace, in particular).

**Solutions:**

We've provided three solutions.

The brute force approach is pretty simple. Let’s start at the index 0 of the given string and check all the prefixes (of lengths 1, 2 and so on) if they are found in the dictionary. Once we find one that is, we can take note of that and do the same thing on the remaining part of the string (recursion can be used to “do the same thing”). Doing so if we end up with an empty string then it is a success. We will then backtrack our way and keep storing the segments we made.

O(dictionaryCount * len * 2^len) where dictionaryCount is the number of words in the dictionary and len is the length of the given input string txt.

To visualize this upper bound on the time complexity let’s think this way. In the worst case, there are O(2^len) arrangements possible. For each arrangement, the total cost is the cost of building the arrangement + checking if all substrings are present in the dictionary. Length of any arrangement will be O(len), hence cost for building any arrangement will be O(len). Now let's see the cost of checking if substrings are present in the dictionary or not. Suppose in ith arrangement = txt1 + " " + txt2 + " " + txt3 + ... + txtx. Now for each substring txt1, txt2,...txtx we are checking, if all these substrings exist in our dictionary. To do so, we are linearly searching for our substring in the dictionary. For txt1 it will take O(dictionaryCount * len(txt1)) time ... for txtx it will take O(dictionaryCount * len(txtx)) time. So in total, it will take O(dictionaryCount * len(txt1)) + ... + O(dictionaryCount * len(txtx)) time. Now len(txt1) + ... + len(txtx) = len which is total length of txt. Hence total time for checking if substrings are present in the dictionary or not will be O(dictionaryCount * len). So, the total time for processing part will be O(dictionaryCount * len * 2^len).

O(len) where len is the length of the input string txt.

Talking the auxiliary space used by this approach. We aren’t using any temporary container to store the dictionary words. But the recursive backtrack algorithm will acquire O(len) stack size in the worst case and hence, the auxiliary memory used is O(len).

O(len + dictionaryCount * max_len) where dictionaryCount is the number of words in the dictionary, len is the length of the given input string txt and max_len is the max length of a word in the dictionary.

The initial input space of the program is dominated by the dictionary array i.e. O(dictionaryCount * max_len), here max_len is the max length of the word in our dictionary. So, our space complexity is auxiliary space plus the input space O(len + dictionaryCount * max_len).

Just by thinking the brute force way, we could have seen that we were recalculating many results over again and hence tells us that the problem has overlapping subproblems property. Consider the following representation for better visualization of overlapping cases:

{ hellomynameisjack }

hello { mynameisjack }

hello my { nameisjack }

hello my name { isjack }

hello my name is { jack }

hello my name is jack { }

We can see that at every step we are solving the same problem over again for the string in the curly braces.

Also, one thing worth noticing is that the segmentation for each text in the curly braces can be appended with its preceding substring. So, this also tells that the problem has an optimal substructure.

Hence, we can use dynamic programming to solve this problem.

So, let’s first calculate the number of different segmentations possible for the given string txt.

We will deal with the DP solution in steps:

We have already done this.

Step 2: Define the State of the DP

Let’s start thinking of the states. It is clear that we need to uniquely identify a substring through our state. So, we can take two integers denoting the start and end point of our substring.

Hence, now our DP state will be a 2-dimensional array.

So, our DP is something like DP[start][end]. Here, DP[start][end] tells us the number of segmented arrangements possible for the substring starting at start and ending at the end. Hence, if we want to know the total arrangements possible for our txt string, we simply ask the DP[0][len-1] state (len is the length of txt string).

Wait, let’s take a break and re-examine our state. Do we really need the end index in our state? No, we can directly omit it to redefine our DP state as DP[start], where DP[start] tells us the number of total segmented arrangements possible for substring starting at start index and ending at length(txt)-1. So, if we now want to know the total number of arrangements for the string txt, we simply go and ask the DP state DP[0]. Take a breath and re-read this current paragraph until you understand the significance of it.

By doing this, We reduced our state space from 2 dimensions to a linear dimension. This process is called state-space reduction.

Step 3: Deciding State Transaction

So far now we have a state like DP[start] which tells us the number of all segmented arrangements possible for the substring starting at index 0 and ending at length(txt)-1.

Now, let’s see how we can compute value of state(idx)

Here, state(idx) means the same as DP[idx].

You can consider state(idx) as the computation and DP[idx] as the result of the computation.

Like the brute approach, we will keep on generating all the substring that starts at idx and check if it exists in the dictionary. If it exists in the dictionary we solve the same problem again for the remaining string

*Pseudo code:*

state(idx):

result = 0

for end = idx to length(txt)-1:

If substring[idx,end] exists in Dictionary:

result += state(end+1)

Above was the pseudo code for the state transition. The variable result stores the total number of arrangements for the state(idx).

Do you find anything fishy in our state transitions? Yes, we were over again recalculating for each state. Hence, we apply memoization and store the computed value of each state so that next time if we need that state we do not recalculate it but efficiently look it up.

We do that by simply storing the value of the result variable in our DP[idx] array. Next time the state info is required we first check our DP[] array for that state if it has the result we simply return it or else we compute it, memorize it and then return it.

Here we store all the dictionary words in an unordered_set (HashSet) for fast lookup whether a string is part of the given list of the dictionary or not.

O(len^3 + dictionaryCount * max_len) where dictionaryCount is the number of words in the dictionary, len is the length of the given input string txt and max_len is the maximum length of a word in the dictionary.

As there are only len states and we are visiting all len states only once because of memoization. Now at each state, we are first generating all its substring starting from the leftmost index and then checking its presence in the dictionary. This process is O(len^2). (There are O(len) substrings starting from the leftmost index, and the length of each substring is O(len).) So, the time complexity of this DP solution will be: the number of states * computation time of each state i.e. len * len^2. Hence, it summarizes to O(len^3) plus the extra time taken to insert the words of list dictionary into the HashSet i.e. O(dictionaryCount * max_len).

O(len + dictionaryCount * max_len) where dictionaryCount is the number of words in the dictionary, len is the length of the given input string txt and max_len is the max length of a word in the dictionary.

The auxiliary space complexity is the space taken by the DP memoization table i.e. O(len) + space taken by the dictionary hash set i.e. O(dictionaryCount * max_len).

O(len + dictionaryCount * max_len) where dictionaryCount is the number of words in the dictionary, len is the length of the given input string txt and max_len is the max length of a word in the dictionary.

The total space complexity is the auxiliary space plus the input space O(len + dictionaryCount * max_len) + O(dictionaryCount * max_len) ~ O(len + dictionaryCount * max_len)

3) optimal_solution:

Here we will use the same approach as used in the other solution, but instead of using HashSet to store all the dictionary words, we will use a prefix tree or trie. The trie data structure will guide us to pick the next character while building the substring and will prune the search at an early stage if no matching substring is found. Looking up of substring s (Index 0 to length(s)-1) in maintained HashSet will take more time in comparison to trie because we already have information of node which is matching with s[length(s)-2] and only needs to check for single character to find out whether the complete substring exists in the trie (i.e. given list dictionary) or not.

O(len^2 + dictionaryCount * max_len) where dictionaryCount is the number of words in the dictionary, len is the length of the given input string txt and max_len is the max length of a word in the dictionary.

Same explanation as other_solution except for one thing: for each state calculation will not take O(len^2) but instead of that, it takes O(len). Hence our total complexity for this solution will be the number of states * computation time of each state i.e. len * O(len). Hence, it summarizes to O(len^2) + the extra time taken to insert the words of list dictionary into the trie i.e. O(dictionaryCount * max_len).

The auxiliary space complexity is the space taken by the DP memoization table i.e. O(len) plus space is taken by the dictionary trie i.e. O(dictionaryCount * max_len).

The total space complexity is the auxiliary space + input space + output space. O(len + dictionaryCount * max_len) + O(dictionaryCount * max_len) + O(1) = O(len + dictionaryCount * max_len).