About usWhy usInstructorsReviewsCostFAQContactBlogRegister for Webinar

Indices Of Words In Text String Problem

Given some text and a bunch of words, find where each of the words appear in the text. Text may contain spaces, words may not.

For every given word you need to return a list of (zero-based) indices of where that word starts in the text. If a word isn’t found in the text, return [-1] for that word.


Example

Input:

text = “you are very very smart”

words = [“you”, “are”, “very”, “handsome”]


Output:

[ [0], [4], [8, 13], [-1] ]

“you” is found in the given text starting at the index 0.

“are” is found in the given text starting at the index 4.

“very” is found in the given text two times, starting at the indices 8 and 13.

“handsome” isn’t found in the given text.


Notes

Input Parameters:

Function has two arguments: string text and list of strings words.


Output: Function must return a list of lists of integers. One list for each one of the words. i-th list must contain all indices of characters in text where i-th word starts, in the ascending order. If i-th word isn’t found in the text at all, then i-th list must be [-1].


Constraints:

• Text and words may contain a-z, A-Z, 0-9, “$”, “#”, “@”, “?”, “;”.

• Text may contain spaces, too, but never two or more spaces consecutively. Spaces separate words in the text string.

• Text won’t start or end with a space.

• Indexing of characters in text is zero-based.

• Words list will contain unique strings.

• 1

• 1

• 1


Solution

We provided three solutions.


1) brute_force_solution.java

We literally compare each word from words with every word from the text. When the two are equal we take note of the start index of the word in the text.

Time Complexity:

O(n*w*l).

Let n be the number of words in text,

w be the number of words and

l be the maximum length of a word.

Processing each pair of words takes O(l) time as we compare them character by character. We compare each of n words with every one of w words for the total time complexity of O(n*w*l).

Auxiliary Space Used:

O(n+w).

The list of lists that we return takes O(w+n) space. Since words are unique, any given word from text can match at most one word from words so the total number of indexes in the returned list of lists won’t exceed n and we know that the outer list has exactly w lists, giving us a total of O(w + n).

Space Complexity:

O((n+w)*l).

Adding up O(w*l) of words, O(n*l) of text and O(w+n) of the auxiliary space we get O((n+w)*l).


// -------- START --------
    static ArrayList> find_words(String text, List words) {
        ArrayList> answer = new ArrayList<>();
        String[] wordsInText = text.split(" ");
        for (String word : words) {
            ArrayList indexes = new ArrayList<>();
            int index = 0;
            for (String s : wordsInText) {
                if (s.compareTo(word) == 0) {
                    indexes.add(index);
                }
                index += s.length() + 1;
            }
            if (indexes.isEmpty()) {
                indexes.add(-1);
            }
            answer.add(indexes);
        }
        return answer;
    }
}
// -------- END --------
	


2) optimal_solution1.java

In this solution we preprocess text and create its index, see “HashMap textIndex” variable. By the time that’d done, each word from the text has the list of its starting indices pre-compiled. All that’s left is to look up those lists of indexes for every word from words.


Time Complexity:

O((n+w)*l).

It takes O(I) time to calculate hashcode or to compare two strings up to l characters long. Thus populating the hashmap with n words will take O(n*l), making w searches in that hashmap will take O(w*l). Total time is the sum of those: O(n*l) + O(w*l) = O((n+w)*l)

Auxiliary Space Used:

O((n*l)+w).

Hashmap which we pre-compute takes O(n*l) space.

The list of lists that we return takes O(w+n) space (see explanation in Auxiliary space section for brute_force_solution.java). Summing up the two gives O(n*l) + O(w+n) = O((n*l)+w).

Space Complexity:

O((n+w)*l). 

Text input takes O(n*l) and words take O(w*l). Adding up those two and the auxiliary space we get the total space complexity: O(n*l) + O(w*l) + O((n*l)+w) = O((n+w)*l).


// -------- START --------
    static ArrayList> find_words(String text, List words) {
        String[] wordsInText = text.split(" ");

        // {word -> [index1, index2]}
        HashMap> textIndex = new HashMap<>();
        int currentIndex = 0;
        for (String word : wordsInText) {
            ArrayList indexes = textIndex.get(word);
            if (indexes == null) {
                indexes = new ArrayList<>();
            }
            indexes.add(currentIndex);
            textIndex.put(word, indexes);
            currentIndex += word.length() + 1;
        }

        ArrayList> answer = new ArrayList<>();
        for (String word : words) {
            ArrayList indexes = textIndex.get(word);
            if (indexes == null) {
                indexes = new ArrayList<>(Collections.singleton(-1));
            }
            answer.add(indexes);
        }

        return answer;
    }
}
// -------- END --------
	


3) optimal_solution2.java

In this solution we use a trie (prefix tree), First we insert all words from the text into the trie. Then we look up every word from words in the trie.

Although this solution has the same worst case time and space complexity as the hashmap based optimal_solution1.java, it will utilize less space when many words share common prefixes.

In the actual interview many interviewers will prefer to hear the trie based solution to the hashmap based one.


Time Complexity:

O((n+w)*l).

Insert and search operations in the trie take O(l) time each.

The algorithm makes n insertions and w searches.

Auxiliary Space Used:

O(n*l + w).

Trie takes O(n*l) space and the list of lists that we return takes O(w+n) (see explanation in Auxiliary space section for brute_force_solution.java). Adding that up we get O(n*l + w).

Space Complexity:

O((n+w)*l).

Text input takes O(n*l) and words take O(w*l). Adding auxiliary space to that we get the total space complexity: O(n*l) + O(w*l) + O(n*l + w) = O((n+w)*l).


// -------- START --------

    /**
     * See https://en.wikipedia.org/wiki/Trie a.k.a. prefix tree.
     *
     * Note that this class does not have a character field.
     * Character associated with any given {@link TrieNode} is the one
     * associated with that node in {@link #children} map in the parent node.
     *
     * Also note that the root node (instantiated in {@link #find_words})
     * does not have a character associated with it. Root ~ empty string.
     */
    static class TrieNode {
        final HashMap children = new HashMap<>();

        /**
         * For every word in the text that *ends* in this node,
         * index of where that word *starts* in the text will be in this list.
         */
        final ArrayList indexes = new ArrayList<>();
    }

    static ArrayList> find_words(String text, List words) {
        TrieNode root = new TrieNode(); // Root ~ empty string => no associated character.
        String[] wordsInText = text.split(" ");
        int index = 0;
        for (String word : wordsInText) {
            insertIntoTrie(root, word, index);
            index += word.length() + 1;
        }

        ArrayList> answer = new ArrayList<>();
        for (String word : words) {
            answer.add(findWord(root, word));
        }
        return answer;
    }

    static void insertIntoTrie(TrieNode root, String word, int index) {
        TrieNode head = root;
        for (char c : word.toCharArray()) {
            TrieNode node = head.children.get(c);
            if (node != null) {
                head = node;
            } else {
                node = new TrieNode();
                head.children.put(c, node);
                head = node;
            }
        }
        head.indexes.add(index);
    }

    static ArrayList findWord(TrieNode root, String word) {
        TrieNode head = root;
        for (char c : word.toCharArray()) {
            TrieNode node = head.children.get(c);
            if (node != null) {
                head = node;
            } else {
                return new ArrayList<>(Collections.singleton(-1));
            }
        }
        return head.indexes.isEmpty() ?
                new ArrayList<>(Collections.singleton(-1)) : head.indexes;
    }
}
// -------- END --------
	


Attend our Free Webinar on How to Nail Your Next Technical Interview

Recommended Posts

All Posts