# Kahn’s Algorithm for Topological Sorting

If you are a software engineer looking to land your dream job, sorting algorithms are an indispensable part of your technical interview prep. One of these sorting algorithms is topological sort, or top sort, which is defined only for directed acyclic graphs (DAGs).

Topological order is the result of a linear ordering of a DAG’s vertices such that for every directed edge (U, V) present in it, U comes before V in the topological ordering. This means there may be multiple valid topological orderings possible for the same DAG. To learn about this in more detail, check out our article on topological sort.

As you may know, topological sorting can be done using algorithms like Kahn’s algorithm, depth-first search, and some parallel algorithms. This article focuses on one of these algorithms: Kahn’s algorithm (Kahn algorithm or Kahn topological sort), which can help people working as software developers/coding engineers solve some complicated-looking questions with more ease.

• The Basis of Kahn’s Algorithm
• Outline of Kahn’s Algorithm
• How Kahn’s Algorithm Works
• Kahn’s Algorithm Example
• Kahn’s Algorithm
• Kahn’s Algorithm Pseudocode
• Calculation of indegree
• Kahn’s Algorithm Code
• Kahn’s Algorithm Complexity
• Applications of Kahn’s Algorithm

## The Basis of Kahn’s Algorithm

The basis of Kahn’s Algorithm is the following assertion:

A directed acyclic graph (DAG) G has at least one vertex with the indegree zero and one vertex with the out-degree zero.

Let's try and prove this assertion and see if it’s accurate:

For any directed acyclic graph G, we can say that it does not contain a cycle. Since there is no cycle, there’s no endless loop that exists as a path in the graph. In other words, all paths in G are of finite length.

Let us now consider a path P that represents the longest path in G. We know that:

• P is directed and acyclic, so it has a beginning and an end. Let us say it starts from source vertex u and ends at destination vertex v.
• For P to meet the definition of the longest path that is finite or acyclic, the source vertex u can’t have any incoming edge, and destination vertex v can’t have any outgoing edge. So, P is the longest path only if the indegree of source vertex u = 0 and outdegree of destination vertex v = 0.
• Since a DAG has only finite paths, there will always exist a longest path P. This means any DAG will always have at least one vertex with indegree 0 and at least one vertex with outdegree 0.

## Outline of Kahn’s Algorithm

Essentially, Kahn’s algorithm works by keeping track of the number of incoming edges into each node (indegree). It repeatedly:

1.  Finds nodes with no incoming edge, that is, nodes with zero indegree (no dependency).
2.  Stores the nodes with zero indegree in a stack/queue and deletes them from the original graph.
3.  Deletes the edges originating from the nodes stored in step 2. It is achieved by decrementing the indegree of each node connected to the nodes removed in step 2.

This process continues until no element with zero indegree can be found. As mentioned earlier, this can happen when the topological sorting is complete and also when a cycle is encountered. For this reason, the algorithm checks if the result of topological sorting contains the same number of elements that were supposed to be sorted:

• If the numbers match, no cycle is encountered, and we print the sorted order.
• If they don’t, a cycle was encountered, which means the topological sorting was not possible. This is conveyed in such a case as appropriate: by returning null or printing a message.

## How Kahn’s Algorithm Works

We’ll now learn in detail what this really means and how such an algorithm can be implemented.

As we’ve outlined in Kahn’s algorithm, we iteratively remove nodes with zero directed edges pointing towards them. However, these zero indegree nodes may have outgoing directed edges (originating at them and pointing towards other nodes).

These outgoing edges get removed when their source nodes are deleted. This reduces the indegrees of their destination nodes and creates zero or more new nodes with zero dependencies (for a DAG). (The case of creating zero new nodes with indegree zero is only possible if the DAG already contains one or more than one node with in-degree zero. As we’ve seen, it’s a property of a DAG that it will always have at least one node with in-degree zero.)

This process continues till no nodes are left to remove, and we get a topological ordering of the directed acyclic graph. We can condense this process into the following steps:

• Step 1: Store the current indegree of each node and initialize the count of visited nodes to zero.
• Step 2: Take all the nodes with indegree 0 and add them to a queue (Enqueue) or a stack (Push)
• Step 3: Remove a node from the queue (Dequeue)/stack (Pop) and:
- Increment the count of visited nodes (by one).
- Reduce indegree by one for all its neighboring nodes.
- If the indegree of a neighboring node is reduced to zero by the above operation, then add it to the queue/stack.
• Step 4: Continue repeating Step 3 until the queue/stack is empty.
• Step 5: If the count of visited nodes does not equal the number of nodes in the graph, then a cycle is encountered, and topological sorting is not possible for the graph as it’s not a DAG. Return a message stating that. If it is equal, return the topologically sorted result.

## Kahn’s Algorithm Example

The best way to further understand how Kahn’s algorithm sorts topologically is through an illustrative example. Let’s jump right in!

We begin with a directed acyclic graph, as shown in the figure below. We also notice all the directed edges of the DAG, which gives us everything we need to know to represent the DAG.

Edges: {0, 1}, {0, 2}, {1, 3}, {1, 4}, {3, 4}, {3, 5}

Here, “In” = indegree
Order of removal: First 0, then 1 and 2, then 3, and finally, 4 and 5.
Topological order:
0,1,2,3,4,5
Note:
This is not the only possible topological order.
For example, 0,1,3,4,5,2 is also a valid topological order.

We now calculate the indegree of each node. The indegree of a node pictorially refers to the number of edges pointing towards that node. The direction of the arrow is incoming, hence the term indegree.

In terms of edges, each edge {Ui, V} where Ui is the source and V is the destination node contributes to the indegree of its destination node V as the edge points towards it.  We’re going to update the indegree value for any node V every time an edge associated with it as its destination node is deleted.

• Node 0 has zero incoming edges. In = 0
• Nodes 1,2,3, and 5 have one incoming edge. In = 1
• Node 4 has two incoming edges. In = 2

We can now begin applying Kahn’s algorithm for topological sorting:

• Step 1: The indegree of node 0 is zero. This is the only node with indegree zero at the beginning. We remove this node and its outward edges: {0, 1}, {0, 2}
• Step 2: We update the indegree of the deleted edges’ destination nodes. Directed edges contribute to the indegree of their destination nodes, as they are pointing towards them. Now the indegree of nodes 1 and 2 both are decremented by one.
Since the indegree before this decrement for both nodes was 1, this also makes node 1 and node 2 the new nodes with indegree zero. We now remove these two nodes. Node 2 has no outgoing edges. Node 1 has two outgoing edges, so we also remove those: {1, 3}, {1, 4}
• Step 3: We now decrement the indegrees of destination nodes of these edges 3 and 4. Hence the indegree of node 4 goes from 2 to 1, and the indegree of node 3 goes from 1 to 0. This makes node 3 the new node to remove. We also remove its outgoing edges: {3, 4},{3, 5}
• Step 4: We update the indegrees of the destination nodes of the deleted edges. So the indegrees of node 4 and node 5, which were both 1 before the decrement, now become zero. We are now left with only these two nodes: node 4 and node 5, both with indegree zero and no outgoing edges from either node. When multiple nodes have indegree zero at the same time, we can write either one before the other.
• Step 5: There may be multiple topological orderings possible in an acyclic graph, as is the case in this example as well. We repeat this process till all nodes are sorted and output one of these valid topological orderings. A valid topological ordering we end up with here is 0,1,2,3,4,5.

## Kahn’s Algorithm

We can now summarize the above steps in the form of an algorithm:

• Step 0: Find indegree for all nodes.
• Step 1: Identify a node with no incoming edges.
• Step 2: Remove the node from the graph and add it to the ordering.
• Step 3: Remove the node’s out-going edges from the graph.
• Step 4: Decrement the indegree where connected edges were deleted.
• Step 5: Repeat Steps 1 to 4 till there are no nodes left with zero indegree.
• Step 6: Check if all elements are present in the sorted order.
• Step 7: If the result of Step 6 is true, we have the sorted order. Else, no topological ordering exists.
• Step 8: Exit.

## Kahn’s Algorithm Pseudocode

KahnTopologicalSort()

For(vertex=0; vertex<inputSize; vertex++)

Find indegree[vertex]

Create an auxiliary storage like stack or queue

Add nodes with indegree 0 in auxiliary storage

while(node U with indegree zero exists in auxiliary storage)

{

Remove U from auxiliary storage

Remove all its outgoing edges (U, Vi) from the graph.

For destination vertices Vi which had their incoming edges removed:

indegree[vertex Vi]=indegree[vertex Vi]-1

Add new nodes with indegree 0 to the auxiliary storage

)

if(elements sorted = all elements)

Return or Print nodes in topologically sorted order

Else

Return null or Print no topological ordering exists

end topologicalSort()

## Calculation of Indegree

Let us see more closely how the calculation of indegree can be done.

### Method 1

We can use an indegree array to track indegree values. To find the indegree of all nodes initially, we first initialize all values in the indegree array to 0 and then go through all the edges and increment the indegree counter of the destination node of each edge by 1. We do this because the destination node has an edge that’s an incoming edge pointing to it, contributing to its indegree.

Pseudocode

for each node in allNodes

indegree[node] = 0;

for each edge(source,destination) in Edges

indegree[destination]++;

The upper loop runs for all nodes, and the lower loop runs for all edges. With V number of vertices or nodes and E edges, the total complexity of this method becomes O(V+E).

### Method 2

We can go through every node in the list, considering it the source node. Any node connected to it in an edge as a destination node gets its indegree incremented by 1.

Pseudocode

for each node in allNodes

{for each destinationNode in destinationNodesList[node]

indegree[destinationNode]++;}

The inner loop here will be executed E number of times where E is the total number of Edges. The outer loop will be executed V number of times, where V is the total number of vertices or nodes. The overall time complexity for this method will hence also be O(V+E).

## Kahn’s Algorithm Code

Let us now look at how we can implement Kahn’s algorithm using C++.

Kahn’s Algorithm Topological Sort C++ Code

#include <iostream>

#include <vector>

using namespace std;

// Structure to store an edge of a graph

struct Edge {

int from, to;

};

// A class that represents a graph

class Graph

{

public:

// Each element of an adjacency list contains a node and every other node it points to.

// To store indegree of a node

vector<int> indegree;

// Constructing a graph

Graph(vector<Edge> const &Edges, int allNodes)

{

// Resizing the vector to hold all nodes

// Initializing indegree of all nodes to zero

indegree.assign(allNodes, 0);

// Adding directed edges to the graph start node-->end node

for (auto &edge: Edges)

{

// Adding an edge from start node to end node

// Incrementing in-degree of end node by 1

indegree[edge.to]++;

}

}

};

bool topologicalSort(Graph const &inputGraph, vector<int> &sorted, int allNodes)

{

vector<int> indegree = inputGraph.indegree;

// Storing all the nodes with no incoming edges

vector<int> zeroIndegreeNodes;

for (int i = 0; i < allNodes; i++)

{

if (indegree[i]==0) {

zeroIndegreeNodes.push_back(i);

}

}

while (!zeroIndegreeNodes.empty())

{

// Deleting fromNode from zeroIndegreeNodes

int fromNode = zeroIndegreeNodes.back();

zeroIndegreeNodes.pop_back();

// Pushing fromNode into sorted

sorted.push_back(fromNode);

{

// Deleting from the graph the edge from fromNode to toNode

indegree[toNode] -= 1;

// If the updated in-degree of toNode is now zero, inserting toNode into zeroIndegreeNodes

if (indegree[toNode]==0) {

zeroIndegreeNodes.push_back(toNode);

}

}

}

// A cycle was encountered if the sorting is done for all zero in-degree nodes in the loop above, but not all input nodes have been sorted

if( (int) sorted.size()!=allNodes)

{

return false;

}

return true;

}

int main()

{

// Count of all the nodes in the graph

int allNodes = 6;

// All the edges of the graph

vector<Edge> edges =

{

{ 0, 1 }, { 0, 2 }, { 1, 3}, {1, 4}, { 3, 4}, { 3, 5 }

};

// A vector to store elements in the sorted order

vector<int> sorted;

// Building a graph, given the number of nodes and all it edges

Graph inputGraph(edges, allNodes);

// Topological sorting. Printing the topologically sorted order if one exists.

if (topologicalSort(inputGraph, sorted, allNodes))

{

for (int i;i< (int) sorted.size();i++)

{

cout << sorted[i] << " ";

}

}

else

{

cout << "Topological sorting is not possible as the graph isn't acyclic";

}

return 0;

}

### Output

The output is 0 2 1 3 5 4, which is another valid topological ordering of the directed acyclic graph we discussed in the example earlier. Nodes (2,1) and (5, 4) reach indegree zero at the same time; therefore, they can be written in any order relative to each other. So, 0 | 1,2 | 3 | 4,5 is just as valid as 0 | 2,1 | 3 | 5,4. Of course, there are even more valid topological orders possible in this case.

## Kahn’s Algorithm Complexity

Let us now discuss the time and space complexities of topological sorting using Kahn’s algorithm.

### Time Complexity

The time complexity is O(V+E) where V = Vertices, E = Edges.

To find out the time complexity of Kahn’s algorithm, let's try to find each step’s time complexity.

• We initialize the indegrees of all nodes to zero in O(V). Then we determine the true value of indegree of each node, for which we iterate through all the edges of the graph in O(E). So the time complexity of this complete step is O(V+E).
• Next, we try to find the nodes with indegrees equal to 0. This will require us to iterate through the entire array that stores the indegrees of each node. The size of this array is equal to V. So, the time complexity of this step is O(V).
• For each node with an indegree equal to zero, we remove it from the graph and decrement the indegrees of the nodes that it was connected to. In the entire run of the algorithm, the number of times we have to perform the decrement operation is equal to the number of edges. So, it will take O(E) time.
• Whenever we decrement the indegree of a node, we check if the node has achieved indegree = 0 status. When this happens, we move it to the stack. In the entire run of the algorithm, it can happen at max (V-1) times. So, the run time of this operation is O(V).
• In the end, we compare the size of the resultant topologically sorted array and the size of the graph to ensure that it was not a cyclic graph. This operation is done in O(1) time.

Adding all the individual run times, the time complexity of topological sort is O(V+E).

This is the best-, worst-, and average-case time complexity of this algorithm, as we perform the same steps regardless of how the elements are organized.

### Space Complexity

Auxiliary space required is O(V).

• We have to create one array to store the indegrees of all the nodes. This will require O(V) space.
• We have to store the nodes with indegree = 0 in a data structure (stack or queue) when we remove them from the graph. In the worst case, this will have to store all the graph nodes, so this will require O(V) space.
• Finally, we need an array to store all the nodes in the sorted order. This will naturally require O(V) space.

Adding all three, we arrive at the space complexity of O(V).

## FAQs on Kahn’s Algorithm

Question 1: What are the applications of Kahn’s Algorithm?

Answer: Kahn’s algorithm for topological sorting is used mainly when tasks or items have to be ordered, where some tasks or items have to occur before others can. For example:

1. Scheduling jobs, given dependencies some jobs have on some other jobs.
2. Course arrangement in educational institutions. Finding prerequisites of any job or task.
3. Detecting deadlocks in operating systems. Finding out if cycles exist in a graph.
4. Resolving symbol dependencies in linkers. Compile-time build dependencies. Deciding the appropriate order of performing compilation tasks in makefiles.

Question 2: Is it possible for a DAG to have no valid topological ordering, and why don’t we order non-DAG graphs topologically?

Answer: No. A DAG will always have at least one valid topological ordering possible. If a graph is cyclic, topological sorting is impossible, and hence, not defined for cyclic graphs. And if a graph is undirected, each edge being bidirectional creates a cycle between the two nodes it connects; hence topological ordering isn’t possible in that case either. The graph has to be a DAG for topological ordering to be possible.

If you’re looking for guidance and help with getting your prep started, sign up for our free webinar. As pioneers in the field of technical interview prep, we have trained thousands of software engineers to crack the toughest coding interviews and land jobs at their dream companies, such as Google, Facebook, Apple, Netflix, Amazon, and more!