Register for our webinar

How to Nail your next Technical Interview

1 hour
Loading...
1
Enter details
2
Select webinar slot
*Invalid Name
*Invalid Name
By sharing your contact details, you agree to our privacy policy.
Step 1
Step 2
Congratulations!
You have registered for our webinar
check-mark
Oops! Something went wrong while submitting the form.
1
Enter details
2
Select webinar slot
*All webinar slots are in the Asia/Kolkata timezone
Step 1
Step 2
check-mark
Confirmed
You are scheduled with Interview Kickstart.
Redirecting...
Oops! Something went wrong while submitting the form.
close-icon
Iks white logo

You may be missing out on a 66.5% salary hike*

Nick Camilleri

Head of Career Skills Development & Coaching
*Based on past data of successful IK students
Iks white logo
Help us know you better!

How many years of coding experience do you have?

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Iks white logo

FREE course on 'Sorting Algorithms' by Omkar Deshpande (Stanford PhD, Head of Curriculum, IK)

Thank you! Please check your inbox for the course details.
Oops! Something went wrong while submitting the form.
closeAbout usWhy usInstructorsReviewsCostFAQContactBlogRegister for Webinar
Our June 2021 cohorts are filling up quickly. Join our free webinar to Uplevel your career
close

Find the Order of Characters From an Alien Dictionary Problem

Given a sorted dictionary of an alien language, find the order of characters in the alphabet.

Example One


Input: ["baa", "abcd", "abca", "cab", "cad"]

Output: "bdac"

Example Two


Input: words = ["caa", "aaa", "aab"]

Output: "cab"

Notes

Input parameters: Function has one argument, an array of strings, and the dictionary sorted in the lexicographical order of the alien language.

Output: Return a string consisting of all the characters of the alien alphabet in order.

Constraints

  • 1 <= characters in all the words in the dictionary <= 10^5
  • Input will consist of lowercase Latin characters only
  • Input will be such that it is possible to determine the alphabet order uniquely
  • The dictionary may contain duplicate words

Solution

One simple but suboptimal solution is:

  1. Collect all the unique characters. 
  2. Generate all permutations of those characters.
  3. Validate each permutation against the given dictionary.

Now, let's think about an optimal solution.

If you're familiar with topological sort and what it means, you'll see that the solution to this problem is a simple application of topological sort.

Here’s a quick recap: Topological sort is ordering the vertices in a directed graph such that vertex A appears before vertex B for all directed edges A->B. One way to look at it is that you're given a graph of dependencies, and you have to order the vertices such that no dependencies are broken.

So, what is the graph in this question?

A lexicographically sorted dictionary gives us some information about the order of characters in the alphabet. In other words, from the dictionary, we can figure out that certain characters precede certain other characters in the alphabet.

How do we determine these relationships between characters in practice?

We know a few things about the order of the dictionary. In a dictionary, between two adjacent words, one of the following is true:

  1. There is at least one character different. E.g., between "abcx" and "abcyz," the fourth characters are different — "x" and "y."
  2. The first word is a prefix of the second one. E.g., "abc" and "abcd."

(We assume that words do not repeat in the dictionary. In an interview, it's worth establishing such assumptions explicitly, especially if your algorithm depends on it. Our algorithm won't really care, though.)

If (1) is true, we know from the dictionary property that the character at the mismatch index in the left word appears before its counterpart in the right word in the alphabet. 

("mismatch index" is the lowest index for which word1[index] != word2[index] is true; for example, for "abcx" and "abcyz," the mismatch index is 3 (zero-based) and the different characters are "x" and "y")

Note: If (1) is false but (2) is true, we get no meaningful information because the shorter word will always appear earlier in the dictionary regardless of what characters the longer word ends with. Also, between two adjacent words, the characters after the first mismatch do not convey any relationship between the characters.

So, to solve our problem, we can start by comparing two adjacent words and try to find a mismatch. The first mismatch denotes that the letter in the first word will come before the respective letter in the second word in the alien language. 

Let's look at an example to understand this:

words =

[

"c",

"aaaa",

"aaaa",

"aabc

]

Then we compare:

  1. "c" and "aaaa" — here, the letters at the first mismatch are “c” and “a”; hence, we are sure that “c” will appear before “a.”
  2. "aaaa" and "aaaa" — there is no mismatch, so we cannot conclude anything
  3. "aaaa" and "aabc" — here, the letters at the first mismatch are “a” and “b”; hence, we are sure that “a” will appear before “b.” Also, note that we should only consider the first mismatch. Concluding that “a” will come before “c” based on the second mismatch is wrong!

So, here’s the information we’ve got from this comparison:

  1. “c” comes before “a”
  2. “a” comes before “b”

Combining them, we can figure out that the order of characters is “cab” in the given alien language.

Next, we can use a directed graph to combine the information collected by comparing words. Add a directed edge between the first mismatched characters. Our directed graph will be a directed acyclic graph. Now, we can use topological sort on the DAG to get the order of characters.

Time Complexity:

O(total number of characters).

In the solution, one word will be compared not more than twice — once with the previous word and second with the next word. So comparing words and finding edges will take O(2 * total number of characters) = O(total number of characters).

Also, an edge is added when a mismatch is found. The maximum number of mismatches will be:

Overall time complexity is O(total number of characters + number of different characters + number of words) = O(total number of characters).

Space Complexity:

O(total number of characters).

Input takes O(total number of characters) space, and the graph we build takes O(number of different characters + number of words).


// -------- START --------
/*
Note that we are passing adj_list, topological_order and visited by reference. Either use pass by 
reference or use global variables. 
*/
void dfs(char from, unordered_map> &adj_list, 
	string &topological_order, unordered_set &visited)
{	
	visited.insert(from);
	for (auto it = adj_list[from].begin(); it != adj_list[from].end(); it++)
	{
		if (visited.find(*it) == visited.end())
		{
			dfs(*it, adj_list, topological_order, visited);
		}
	}
	topological_order = from + topological_order;
}

// Topological sort. 
string topological_sort(unordered_map> &adj_list)
{
	string topological_order = "";
	unordered_set visited;
	for (auto it = adj_list.begin(); it != adj_list.end(); it++)
	{
		// If current node is not visited then visit it and visit nodes reachable from that node. 
		if (visited.find(it->first) == visited.end())
		{
			dfs(it->first, adj_list, topological_order, visited);
		}
	}
	return topological_order;
}

string find_order(vector words)
{
	int n = words.size();
	// Contains adjacent nodes. 
	unordered_map> adj_list;
	
	/*
	Initialize nodes with no edges. This is imp. Otherwise testcases having only one type of 
	character will fail.
	*/
	for (int i = 0; i < n; i++)
	{
		for (int j = 0; j < words[i].length(); j++)
		{
			adj_list[words[i][j]] = vector(0);
		}
	}

	// Find mismatch and add edges. 
	for (int i = 0; i < n - 1; i++)
	{
		int min_len = min(words[i].length(), words[i + 1].length());
		for (int j = 0; j < min_len; j++)
		{
			if (words[i][j] != words[i + 1][j])
			{
				adj_list[words[i][j]].push_back(words[i + 1][j]);
				break;
			}
		}
	}

	return topological_sort(adj_list);
}

// -------- END --------

Try yourself in the Editor

Note: Input and Output will already be taken care of.

Find the Order of Characters From an Alien Dictionary Problem

Given a sorted dictionary of an alien language, find the order of characters in the alphabet.

Example One


Input: ["baa", "abcd", "abca", "cab", "cad"]

Output: "bdac"

Example Two


Input: words = ["caa", "aaa", "aab"]

Output: "cab"

Notes

Input parameters: Function has one argument, an array of strings, and the dictionary sorted in the lexicographical order of the alien language.

Output: Return a string consisting of all the characters of the alien alphabet in order.

Constraints

  • 1 <= characters in all the words in the dictionary <= 10^5
  • Input will consist of lowercase Latin characters only
  • Input will be such that it is possible to determine the alphabet order uniquely
  • The dictionary may contain duplicate words

Solution

One simple but suboptimal solution is:

  1. Collect all the unique characters. 
  2. Generate all permutations of those characters.
  3. Validate each permutation against the given dictionary.

Now, let's think about an optimal solution.

If you're familiar with topological sort and what it means, you'll see that the solution to this problem is a simple application of topological sort.

Here’s a quick recap: Topological sort is ordering the vertices in a directed graph such that vertex A appears before vertex B for all directed edges A->B. One way to look at it is that you're given a graph of dependencies, and you have to order the vertices such that no dependencies are broken.

So, what is the graph in this question?

A lexicographically sorted dictionary gives us some information about the order of characters in the alphabet. In other words, from the dictionary, we can figure out that certain characters precede certain other characters in the alphabet.

How do we determine these relationships between characters in practice?

We know a few things about the order of the dictionary. In a dictionary, between two adjacent words, one of the following is true:

  1. There is at least one character different. E.g., between "abcx" and "abcyz," the fourth characters are different — "x" and "y."
  2. The first word is a prefix of the second one. E.g., "abc" and "abcd."

(We assume that words do not repeat in the dictionary. In an interview, it's worth establishing such assumptions explicitly, especially if your algorithm depends on it. Our algorithm won't really care, though.)

If (1) is true, we know from the dictionary property that the character at the mismatch index in the left word appears before its counterpart in the right word in the alphabet. 

("mismatch index" is the lowest index for which word1[index] != word2[index] is true; for example, for "abcx" and "abcyz," the mismatch index is 3 (zero-based) and the different characters are "x" and "y")

Note: If (1) is false but (2) is true, we get no meaningful information because the shorter word will always appear earlier in the dictionary regardless of what characters the longer word ends with. Also, between two adjacent words, the characters after the first mismatch do not convey any relationship between the characters.

So, to solve our problem, we can start by comparing two adjacent words and try to find a mismatch. The first mismatch denotes that the letter in the first word will come before the respective letter in the second word in the alien language. 

Let's look at an example to understand this:

words =

[

"c",

"aaaa",

"aaaa",

"aabc

]

Then we compare:

  1. "c" and "aaaa" — here, the letters at the first mismatch are “c” and “a”; hence, we are sure that “c” will appear before “a.”
  2. "aaaa" and "aaaa" — there is no mismatch, so we cannot conclude anything
  3. "aaaa" and "aabc" — here, the letters at the first mismatch are “a” and “b”; hence, we are sure that “a” will appear before “b.” Also, note that we should only consider the first mismatch. Concluding that “a” will come before “c” based on the second mismatch is wrong!

So, here’s the information we’ve got from this comparison:

  1. “c” comes before “a”
  2. “a” comes before “b”

Combining them, we can figure out that the order of characters is “cab” in the given alien language.

Next, we can use a directed graph to combine the information collected by comparing words. Add a directed edge between the first mismatched characters. Our directed graph will be a directed acyclic graph. Now, we can use topological sort on the DAG to get the order of characters.

Time Complexity:

O(total number of characters).

In the solution, one word will be compared not more than twice — once with the previous word and second with the next word. So comparing words and finding edges will take O(2 * total number of characters) = O(total number of characters).

Also, an edge is added when a mismatch is found. The maximum number of mismatches will be:

Overall time complexity is O(total number of characters + number of different characters + number of words) = O(total number of characters).

Space Complexity:

O(total number of characters).

Input takes O(total number of characters) space, and the graph we build takes O(number of different characters + number of words).


// -------- START --------
/*
Note that we are passing adj_list, topological_order and visited by reference. Either use pass by 
reference or use global variables. 
*/
void dfs(char from, unordered_map> &adj_list, 
	string &topological_order, unordered_set &visited)
{	
	visited.insert(from);
	for (auto it = adj_list[from].begin(); it != adj_list[from].end(); it++)
	{
		if (visited.find(*it) == visited.end())
		{
			dfs(*it, adj_list, topological_order, visited);
		}
	}
	topological_order = from + topological_order;
}

// Topological sort. 
string topological_sort(unordered_map> &adj_list)
{
	string topological_order = "";
	unordered_set visited;
	for (auto it = adj_list.begin(); it != adj_list.end(); it++)
	{
		// If current node is not visited then visit it and visit nodes reachable from that node. 
		if (visited.find(it->first) == visited.end())
		{
			dfs(it->first, adj_list, topological_order, visited);
		}
	}
	return topological_order;
}

string find_order(vector words)
{
	int n = words.size();
	// Contains adjacent nodes. 
	unordered_map> adj_list;
	
	/*
	Initialize nodes with no edges. This is imp. Otherwise testcases having only one type of 
	character will fail.
	*/
	for (int i = 0; i < n; i++)
	{
		for (int j = 0; j < words[i].length(); j++)
		{
			adj_list[words[i][j]] = vector(0);
		}
	}

	// Find mismatch and add edges. 
	for (int i = 0; i < n - 1; i++)
	{
		int min_len = min(words[i].length(), words[i + 1].length());
		for (int j = 0; j < min_len; j++)
		{
			if (words[i][j] != words[i + 1][j])
			{
				adj_list[words[i][j]].push_back(words[i + 1][j]);
				break;
			}
		}
	}

	return topological_sort(adj_list);
}

// -------- END --------

Worried About Failing Tech Interviews?

Attend our free webinar to amp up your career and get the salary you deserve.

Ryan-image
Hosted By
Ryan Valles
Founder, Interview Kickstart
blue tick
Accelerate your Interview prep with Tier-1 tech instructors
blue tick
360° courses that have helped 14,000+ tech professionals
blue tick
100% money-back guarantee*
Register for Webinar
All Blog Posts