Given a sorted dictionary of an alien language, find the order of characters in the alphabet.
Input: ["baa", "abcd", "abca", "cab", "cad"]
Input: words = ["caa", "aaa", "aab"]
Input Parameters: Function has one argument, array of strings, the dictionary sorted in the lexicographical order of the alien language.
Output: Return a string consisting of all the characters of the alien alphabet, in order.
● Input will consist of lowercase latin characters only.
● Input will be such that it is possible to determine the alphabet order uniquely.
● The dictionary may contain duplicate words.
One simple but suboptimal solution is:
Collect all the unique characters. Generate all permutations of those characters. Validate each permutation against the given dictionary.
Now let's think about an optimal solution.
If you're familiar with topological sort and what it means, however, you'll see that this is a simple application of topological sort.
Quick recap - Topological sort is ordering the vertices in a directed graph such that vertex A appears before vertex B for all directed edges A->B. One way to look at it is that you're given a graph of dependencies and you want to order the vertices such that no dependencies are broken.
So, what is the graph in this question anyway?
Lexicographically sorted dictionary gives us *some* information about order of characters in the alphabet. In other words, from dictionary we can find out that certain characters precede certain other characters in the alphabet.
How do we determine these relationships between characters in practice?
We know a few things about the dictionary ordering.
In a dictionary, between two adjacent words, one of the following is true:
1. there is at least one character different. E.g. "abcx" and "abcyz" different fourth characters: "x" and "y".
2. first word is a prefix of the second one. E.g. "abc" and "abcd".
[We assume that repeated words aren't found in the dictionary. Though in an interview it's worth establishing such assumption explicitly, especially if your algorithm depends on it. Our algorithm won't really care though.]
If (1) is true, we know from the dictionary property that the character at mismatch index in the left word appears before its counterpart in right word in the alphabet. ("mismatch index" is the lowest index for which word1[index] != word2[index] is true; for example for "abcx" and "abcyz" the mismatch index is 3 (zero based) and the different characters are "x" and "y")
If (1) is false but (2) is true, there is no meaningful information with respect to
the alphabet. Because the shorter word will always be earlier in the dictionary regardless
what characters the longer word has in the end.
It is also necessarily true between two adjacent words, that the characters after the first mismatch do not convey any relationship between the characters
We can compare two adjacent words and try to find a mismatch. First mismatch denotes that, letter in first word will come before letter in second word in that alien language. Let's take one example to understand this:
Then we compare:
1) "c" and "aaaa", here the first mismatch is between 'c' and 'a' hence we are sure that 'c' will come before 'a'.
2) "aaaa" and "aaaa", there is no mismatch so we cannot conclude anything!
3) "aaaa" and "aabc", here the first mismatch is between 'a' and 'b' hence we are sure that 'a' will come before 'b'. Also note that we should only consider the first mismatch. So from the second mismatch concluding that 'a' will come before 'c' is wrong!
Now total information we have collected is:
1) 'c' comes before 'a'
2) 'a' comes before 'b'
Combining them we can figure out the order of characters is 'cab' in the given alien language.
Here we can use a directed graph to combine the information collected by comparing words. Add a directed edge between first mismatched characters! Our directed graph will be a directed acyclic graph! Now on DAG we can use topological sort to get the order of characters!
O(total number of characters).
In the solution one word will be compared maximum two times. With 1) previous word and 2) next word. So comparing words and finding edges will take O(2 * total number of characters) = O(total number of characters).
Also an edge is added when a mismatch is found. Maximum number of mismatches will be
Overall time complexity is O(total number of characters + number of different characters + number of words) = O(total number of characters).
O(total number of characters).
Input takes O(total number of characters) space and the graph we build takes O(number of different characters + number of words).