Interview Kickstart has enabled over 3500 engineers to uplevel.
Learning the basics of graphs and graph theory is key for coding engineers at any stage in their career. But, graph theory can be intimidating. Even if you are an experienced software developer preparing for a technical interview for your dream job, graph theory can make you anxious.
In this article, we will cover the basics to help you with your prep. We’ll discuss:
Graph theory is a branch of mathematics that deals with a network of points that are connected via lines. Given a set of nodes that are connected in some way via edges, Graph theory is essentially the study of their relationships. Nodes and edges are a great abstraction for many real-life problems. Examples include problems involving arrangement, networking, and optimization relevant to city layouts, social media reach, navigation, and political campaigns. We’ll start by becoming acquainted with some important concepts in graph theory.
A graph is a data structure that contains a finite set of vertices/nodes and a set of directed or undirected edges connecting them. An edge is represented in the format (U, V). Here, the edge connects U and V. U is the source node, and V is the destination node.
For a directed graph, the degree of a node refers to the number of edges coming into or going out from a node. The number of incoming edges is called in-degree, and the number of outgoing edges is called out-degree. For an undirected graph, the degree of a node again refers to the number of edges associated with a node. However, in an undirected graph, the edges are bidirectional, and all edges are counted towards the degree. We can then conclude a few points based on this definition:
Path refers to a sequence of edges that connect a sequence of nodes. A path may be a single edge connecting two nodes or multiple edges that are connected through multiple nodes.
Graph theory’s origins have been traced to be in the 18th century, around the year 1735. It started as recreational puzzles in mathematics that gained traction, became important, and have now evolved into a valuable area of mathematical research.
Leonhard Euler, a Swiss mathematician, physicist, astronomer, geographer, logician, and engineer, proved the solution for one such old puzzle, the Königsberg bridge problem. This Euler’s proof also proved what is now known as the first theorem of graph theory. Many mark this proof as the origin of graph theory.
While Euler successfully proved his result, techniques, and tests that would help him establish his assertion in a robust mathematical way weren’t developed well. So for decades, there was little to no progress in graph theory. Now, with the tools and techniques we have available in modern times, we’re really starting to explore the full potential of graph theory.
Graphs can be separated into types based on various factors. Understanding the type of graph a graph problem should be represented by and understanding the graph’s properties are crucial to solving it. We’ll now discuss some of the major types of graphs.
Directed Graphs: In directed graphs, all edges are directed. This means that we can traverse in only one direction. So if node A connects node B through a directed edge from A to B, we can traverse from A to B but not B to A.
Undirected Graphs: In undirected graphs, all edges are bidirectional. So, if node A and node B are connected, we can traverse from A to B as well as B to A. Straight lines, not arrows, represent the edges. Undirected graphs can be used to represent mutual/bidirectional relationships such as bidirectional roads in a city map with roads represented as edges and locations represented as nodes.
In the above figure, for an undirected graph, each node could represent different cities, and the edges show the bidirectional roads. In the directed graph shown above, we can represent not only bidirectional roads but one-way roads as well, since here, we use directed edges instead of bidirectional edges.
Weighted Graphs: A weight is a numerical value that can be associated with an edge. These weights associated with the edges represent real criteria like the importance, cost, quantity, popularity, and distance involved in that edge traversal. Both directed and undirected graphs can be weighted. We can represent tolls on each of the roads using weights. We can also represent the distance or time required to travel between two locations as weights. GPS navigation systems use weighted graphs to do these calculations and suggest the optimized paths.
Unweighted Graphs: If the edges do not carry weights, as we saw in the plain examples of directed and undirected graphs, the graph is said to be unweighted. We can use unweighted graphs when the properties of the route or edge (barring their directional nature) are not relevant. In the case of unweighted graphs, we focus on which nodes are connected, and we may care about the directional or cyclic nature of the connections.
Cyclic Graphs: If a graph contains one or more cycles, it is deemed a cyclic graph. In other words, a cyclic graph contains at least one node, which has a path that connects it back to itself. If the cyclic graph contains only one cycle, it is called a unicyclic graph.
Acyclic Graphs: An acyclic graph, on the other hand, has no cycles. Since there are zero cycles, no node has a path to be traced back to itself. An undirected graph that’s acyclic is a forest graph.
A directed graph with no directed cycles is called a directed acyclic graph. By directed cycles, we mean that following the directions in the graph will never form a closed loop. So in the figure shown above, the first directed graph is acyclic, and the second directed graph is cyclic. Note that just the direction of one edge has been reversed to create the cyclic graph out of the acyclic graph. Just that one change leads to a directed cycle being formed in the graph.
To connect the concept of a tree and graphs, if a connected undirected graph is also acyclic, it is a tree graph. We’ll discuss both trees and forests more in the coming sections. We’ve just answered the question: “When is a graph a tree?”. We’ll now look at what this means in more detail.
We’ve just learned that graphs and trees are related to each other as data structures and that a connected undirected acyclic graph is what we call a tree. Let us unpack what that means. A graph is a tree if it:
Trees are used to represent hierarchical structures in graphical forms. A tree is an undirected graph unless mentioned otherwise. A directed graph, however, can be labeled a directed tree graph (polytree) if:
Tree plot (diagram):
Tree graph theory:
As we’ve seen, a tree is a specific type of graph. Anything true for a graph would be true for a tree, but trees would be a special case of graphs. So let us see where the differences lie.
Now that the basics are covered, let’s look at some important concepts related to graph theory and trees: subgraph, forest, spanning tree, circuit rank, and handshaking lemma.
A sub-graph is a graph all of whose nodes and edges are contained the same way, in a larger graph, which may have additional nodes and/or edges as well. For example:
An undirected acyclic graph is called a forest graph. In other words, a forest is an undirected graph whose connected components or sub-graphs are trees. The image of a forest may look like multiple sub-graphs/trees, but it represents just a single disconnected graph. We can also say that a forest is a disjoint collection of trees.
Let S be a sub-graph of a connected graph G. S is called a spanning tree of G if it’s a tree that contains all the vertices of graph G. For an undirected graph G’, if S’ is a subgraph and contains all vertices of G’ covered using the minimum possible number of edges, S’ is a spanning tree of G’. If there are N nodes in a graph, its spanning tree will always have N-1 edges. A graph may have multiple spanning trees, but if the graph is a disconnected graph, it will not have a spanning tree at all.
The circuit rank of an undirected graph refers to the minimum number of edges that must be deleted from the graph to break all its cycles and get a tree or a forest.
For a connected graph G consisting of N nodes and E edges, a spanning tree S of graph G will contain N-1 edges. So the number of edges to be deleted to get a spanning tree would be the number of edges-(N-1) or E-N+1, which is the circuit rank of graph G.
Handshaking lemma states that any finite undirected graph will always have an even number of nodes with an odd degree. So for a finite undirected graph, the statement: “Odd number of edges touch (go into/out of) this node” will always be true only for an even number of nodes in the undirected graph.
We’ll now look at some data structures used in graphs for representation.
As the name indicates, an adjacency matrix is a matrix that helps us find if any two specific nodes are adjacent, i.e., directly connected via an edge. Given any graph G with N nodes, its adjacency matrix A is a two-dimensional array of size N x N.
The adjacency matrix A[N][N] can be filled by going to each position A[i][j] in it, seeing if nodes i and j have a direct edge connecting them and storing a value appropriately indicating the result at A[i][j].
The adjacency matrix of the graph:
For simple graphs, adjacency matrix A is a Boolean matrix. In that case, if i and j are connected via a direct edge, A[i][j] stores 1 (edge contributes +1 and loop contributes +2). If they aren’t connected, it stores 0. For weighted graphs, if a direct edge is found, the value stored is the weight of that edge. If an edge isn’t found in the case of weighted graphs, it’ll contain a specific value that can clearly indicate that. A very large or negative number can be used for this purpose, so it can’t match with and hence, doesn’t get confused with a weight stored of the same value.
An adjacency list represents a graph as an array of linked lists. It works by storing the neighboring nodes for each node. An adjacency list is created by making an array of size N, where N is the number of nodes in the graph. And each element A[i] of that array is a linked list containing all the nodes that are directly connected to the node with index i. For weighted graphs, the linked list will additionally store the edge weight along with the neighboring nodes.
Edges list represents a graph as a list of edges. In the edges list, all the edges are stored inside a linked list. Each object holds two nodes (U, V), indicating that they are directly connected by an edge. For weighted graphs, each object will additionally store the weight of the edge between U and V along with the nodes U and V. If the graphs have many nodes but few edges, it can be a more compact, clearer representation.
To learn more, visit our interview questions and problems pages.
Question 1: When can a directed graph be a tree?
A tree must be an undirected graph. However, if a directed graph is also acyclic (is a DAG), and its underlying undirected graph is a tree, the graph will be called a directed tree or polytree.
Question 2: What is the order and size of a graph?
The number of nodes is known as the order of the graph, and the number of edges in the graph represents the size of a graph.
If you’re looking for guidance and help to nail your next technical interview, sign up for our free webinar. As pioneers in the field of technical interview prep, we have trained thousands of software engineers to crack the toughest coding interviews and land their dream jobs at Google, Facebook, Apple, Netflix, Amazon, and other Tier-1 tech companies.
Article contributed by Tanya Shrivastava