Interview Kickstart has enabled over 3500 engineers to uplevel.
If you are preparing for a technical interview, you must know the basic graph traversal techniques, as they are quite popular in software engineering technical interviews. In this article, we will discuss one such graph traversal algorithm — depth-first search (DFS).
Many real-world problems are modeled with graphs. Hence, it is expected of software engineers and developers to have good command over graph traversal algorithms such as DFS and BFS. Graph traversal algorithms are the backbone of many other easy and complex graph algorithms.
This article will cover the following DFS concepts:
Traversing a graph means visiting all vertices and edges of a graph at least once in a well-defined order. The order in which the algorithm visits the nodes is important. Different problems may require different orders of traversals.
Depth-first search (DFS) is a recursive algorithm for traversing a graph. It uses the idea of exhaustive search — it will keep moving deeper into the graph until that particular path is entirely exhausted (in other words, a dead end is found).
It is used to solve many interesting problems, such as finding a path in a maze, detecting and breaking deadlocks in operating systems, and many others.
Here are a few (among many) applications of DFS:
DFS starts traversing the graph from a starting node (provided as input to the algorithm). It chooses a path starting from this node, and goes as deep as it can. After exhausting that path, it backtracks until it finds another unexplored path and then explores it recursively.
It keeps recursing until the complete graph has been explored or until the node we were looking for is found.
We’ll use the following graph for illustration:
We start from node 0, mark it visited, and then travel towards node 1.
On reaching node 1, we see it’s unvisited — we mark it visited and move towards its child node 4.
Node 4 is also unvisited; so, we mark it visited.
As node 4 does not have any child, we backtrack to node 1.
Node 1 does not have any child left to explore (in other words, we have exhausted node 1). Therefore, we backtrack to node 0.
Node 0 is not yet completely explored — we travel to its child, node 2, and mark it visited.
Node 2 does not have any neighbors — we backtrack to node 0.
Node 0 has a neighbor node 3, which is not visited. So, we explore it.
Node 3 has a neighbor, node 2, but since it's already visited; so, we don’t go over it again and backtrack to node 0. Node 0 does not have any unvisited neighbors left.
In this manner, we will visit the complete graph using DFS.
If we remove all edges other than the green ones, we will get a tree — this is called the DFS tree for this graph, and all the green edges are the only edges of that tree. The image below shows the DFS tree for the above graph.
We can divide the graph edges into the following kinds:
Consider the below-directed graph for a better understanding. Applying DFS on it will give the following order:
0 1 2 3 4
You can see the four types of edges in the following figure:
The DFS pseudocode is very short and concise. Go ahead and implement it in any programming language of your choice.
The graph is stored in G, and the starting node is u.
Note: Recursive and iterative DFS might traverse the graphs in different but correct orders.
We are using C++ to implement the recursive and iterative DFS algorithm. Try and implement it on your own before seeing the code!
Output for recursive code:
DFS traversal for given graph: 0 1 4 2 3
Output for iterative code:
DFS traversal for given graph: 0 3 2 1 4
Note: It might seem like one of the two codes is wrong — but both are absolutely correct as both algorithms traversed the graph in a depth-first order.
The time complexity for the DFS algorithm depends on the data structure in which the graph is stored.
If we use an adjacency list to represent the graph, its time complexity ends up as O(V+E), where V is the number of vertices in the graph and E is the number of edges in the graph.
If we use the adjacency matrix to represent the graph, its complexity equals O(V2), where V is the number of vertices in the graph.
Special case: When the graph is a tree, the number of edges E is V - 1. In this case, the time complexity becomes O(V) if we use an adjacency list to represent the tree.
The DFS algorithm’s space complexity is O(V), excluding the memory consumed by the graph representation, where V is the number of nodes in the graph.
Note: Graph representation takes O(V+E) memory when using adjacency list and O(V2) memory when using adjacency matrix
Furthermore, the depth of the recursion tree also matters as it consumes stack memory. The maximum depth for the recursion tree is the longest path from the starting node to any other node in its connected component, which can be O(V) in the worst-case.
Special case: If we know beforehand that graph is a tree, we can implement DFS without using O(V) memory for the array to keep track of the visited vertices. However, recursion will still consume some memory.
We can do this by keeping track of the parent of the current node being explored. And when searching for an unvisited neighbor node, we will skip the parent node (as we already would have visited that node).
In our implementation above, if the graph is disconnected, the code will not visit nodes that are disconnected from the starting node.
For example, consider the below graph:
If we apply the above implementation of DFS on this graph (in which we have considered node 0 as the starting node), we will end up not visiting node 3 and node 4 as they are not in the same connected component as node 0.
So, what can we do to make our code work?
We’ll assume every node to be a starting node and apply DFS only on those nodes that have not been previously visited.
DFS traversal for given graph:
0 1 2
Following are a few differences between BFS and DFS:
To learn more, check out our article on breadth-first search.
Question 1: How to detect a cycle in a directed graph using DFS?
Answer: We need to check if there exists a back edge in a graph. As we discussed earlier, the back edge is an edge from the graph that is not present in the DFS tree after applying DFS, and it connects a node “u” to one of its ancestors. We can keep track of the active nodes in another array while running the DFS. Those nodes that are placed on the stack but have not been removed are considered active.
During DFS, when a new node is visited, all of its ancestors should be active. Also, all of the active nodes must be the new node’s ancestors. So, whenever we explore any new node, if we find an edge directed from the new node to any of the active nodes, we can conclude that the given graph has a cycle.
Proof: Suppose the back edge connects node “u” with node “v” and it goes from “u” to “v.” Without loss of generality, also suppose that “v” was discovered before “u”. As “v” and “u” are both active, there must be a path in the DFS tree from “v” to “u.” That path only uses tree edges. On the other hand, we just discovered another path from “u” to “v” using a back edge. So, in this graph, there exists a path that starts from “u” and comes back to “u.” Hence, it must be a cycle.
Code snippet in C++ for DFS function for finding cycle in directed graph:
Question 2: How to detect a cycle in an undirected graph using DFS?
Answer: An undirected graph is considered acyclic if and only if it’s a tree. So, the problem boils down to finding out if the given graph is a tree. To check this, we can start a DFS from any node, and for every node, keep track of its parent node in the DFS tree. If during DFS we visit a node twice (excluding the parent), we can conclude that the given undirected graph is cyclic.
Question 3: Why does DFS use a “stack” data structure? Can it use a queue data structure?
Answer: DFS explores any path as long as possible. When it hits a dead end, it backtracks to the just-previous state. So, we need a data structure that can give us the most recent element (or call). The stack serves this purpose — Last In First Out (LIFO). The queue cannot be used to implement DFS as it is based on the opposite concept — First In First Out (FIFO).
Question 4: Can we use DFS to solve the shortest path distance between two nodes in unweighted graphs?
Let’s take an example: from the starting node, if node “u” is visited before node “v,” there’s no guarantee that u is closer to the starting node. Therefore, DFS cannot be used to find the shortest distance between two nodes in polynomial time.
Problems based on graphs and its traversal feature in almost every coding interview for software developers. If you’re looking for guidance and help to nail these questions and more, sign up for our free webinar.
As pioneers in the field of technical interview prep, we have trained thousands of software engineers to crack the toughest coding interviews and land jobs at their dream companies, such as Google, Facebook, Apple, Netflix, Amazon, and more!
Article contributed by Divyansh Gupta