Interview Kickstart has enabled over 3500 engineers to uplevel.
Whether it is for the role of software developer, coding engineer, software engineer, or any such position in the IT industry, heap sort is an essential part of the technical interview prep.
In fact, it’s almost as if its primary use is cracking job interviews! It is rarely used in real-world scenarios, despite being one of the most interesting sorting algorithms.
In this article, we’ll discuss:
To understand how heap sort works, we first need to understand some basic concepts related to binary heaps. Feel free to skip them if you are already familiar with these concepts.
Heap is a tree-based data structure in which all the tree nodes are in a particular order, such that the tree satisfies the heap properties (that is, there is a specific parent-child relationship that is followed throughout the tree).
A heap data structure where the tree is a complete binary tree is referred to as a binary heap.
A complete binary tree is a binary tree in which all levels except the bottom-most level are completely filled, and all nodes in the bottom-most level are as far left as possible. (The last level may or may not be completely filled.)
A full binary tree is a binary tree where every node has 0 or 2 children.
1. They are complete binary trees: This means all levels are totally filled (except maybe the last level), and the nodes in the last level are as left as possible. This property makes arrays a suitable data structure for storing binary heaps.
We can easily calculate indices of a node’s children. So, for parent index i, the left child will be found at index 2*i+1, and the right child will be found at index 2*i+2 (for indices that start with 0). Similarly, for a child at index i, its parent can be found at index i/2.
2. Heaps are typically of two types — max heap and min heap: In a max heap, the value of a node is always greater than or equal to the value of each of its children. Conversely, in a min heap, the value of a parent is always <= the value of each of its children.
3. In a max heap, the element at the root will always be the maximum. In a min heap, the element at the root will always be the minimum. Heap sort algorithm takes advantage of this property to sort an array using heaps.
Heap sort is an efficient comparison-based sorting algorithm that creates a heap from the input array and then sorts the array by taking advantage of a heap's properties.
Please keep in mind, since the heap is a tree-based data structure, this also means that the knowledge of arrays, trees, binary trees, and heaps is key to understanding the heap sort algorithm.
Before going into the workings of heap sort, we’ll visualize the array as a complete binary tree. Next, we turn it into a max heap by using a process we call heapification.
The brilliance of heapification lies in the fact that if all the subtrees in a binary tree are MaxHeaps themselves, the whole tree is a MaxHeap. One way to implement this idea would be:
If we successfully do that, we will have transformed the whole binary tree into a valid MaxHeap after processing all the nodes. One way to optimize this process is by ignoring all the leaf nodes since they don't have any children:
This journey ends when we eventually reach the topmost node and process it.
Let’s see this in more detail:
If this sounds like a recursive method, that's because it is! We keep calling this method recursively for the child nodes that got updated until we reach a stage where the child node is either a leaf or has children, each of whose values are lower.
You might have wondered why we decided to traverse bottom to top and not top to bottom. That's because steps 1-3 for heapifying a node only work if the child nodes are heapified already.
At the end of this process, a max heap is fully formed. We can also make a min heap simply by changing the condition to “parent value should be <= each of its children’s values” (swap values if the condition isn’t met).
Have a look at the following example:
When sorting in-place, a max heap can be used to sort the array in ascending order, and a min heap can be used to sort the array in descending order. If sorting doesn’t have to be in-place, we can use an auxiliary array to place the extracted element from the heap’s top in its correct position, whether we use a min heap or a max heap for the sorting.
But even when sorting is not the aim, a min/max heap in itself is a useful construction. The root element of a max heap always contains the maximum element, and that of a min heap always has the minimum element. This quality of heaps can come in handy when we want to extract only the largest or smallest element from an array without keeping the remaining items in the sorted order.
Because algorithms like merge sort and quicksort are better in practice, heap sort has limited usage. Heaps are extensively used for problems like getting the largest or smallest elements in an array, sorting an almost sorted array, etc.
Now that we’ve learned how to create a heap from an array using the heapify method, we will look into using the heap to sort the array.
After the heap formation using the heapify method, the sorting is done by:
On a max heap, this process will sort the array in ascending order. On a min heap, it will sort in descending order.
This process can be best illustrated using an example:
The process above ends when heap size = 2, because a two-element heap is always considered sorted.
So basically, the heap sort algorithm has two parts that run recursively till heap size >= 2:
Here’s the algorithm for heap sort:
Step 1: Build Heap. Build a heap from the input data. Build a max heap to sort in increasing order, build a min heap to sort in decreasing order.
Step 2: Swap Root. Swap the root element with the last item of the heap.
Step 3: Reduce Heap Size. Reduce the size of the heap by 1.
Step 4: Re-Heapify. Heapify the remaining elements into a heap of the new heap size by calling heapify on the root node.
Step 5: Call Recursively. Repeat steps 2,3,4 as long as the size of the heap is greater than 2.
Each time the last array position is discarded from the heap once it contains the right element. The process is repeated until all the input array elements are sorted. This happens when the heap size is reduced to 2, since for a heap that satisfies the heap property, the first two elements will automatically be in order.
Following is the pseudocode for heap sort. Please have a look and try to implement this in a programming language of your choice.
We have implemented the heap sort algorithm to sort in ascending order in C++:
Output: 6 15 21 46 77 91
The time complexity of heap sort is non-quadratic and comes out the same in the best, worst and average cases: O(nlogn)
Let’s see how.
(Note: The following sections are based on working with MaxHeaps)
The heapify method is run on a node whose child nodes are already heapified.
The worst-case run time will be experienced when the heapify method is run on a node that is smaller than all of its children. This means the node has to be swapped through all of its levels to position it at the leaf level. So the worst-case run time will be a function of the height of the subtree, h.
Thus, the worst-case time complexity of each heapify method invocation is O(h). This height h is not a constant. At the bottom of the tree, h is 0, and at the top of the tree, h is equal to log2N.
The time complexity for calling the heapify method for all the nodes of the tree (from bottom to top):
Taking advantage of the properties of Big-O notation, in the last step, we raised the upper limit of the summation from lg(N) to ∞. This will help us simplify the calculation. We’ll do so with the help of known mathematical properties involving the summation of numeric expressions from 0 to ∞.
We will use the following mathematical property:
We can notice that in our equation, we can use the above property by replacing x with 1/2. So, our equation now becomes:
Thus, the first step of heap sort, which is building a heap out of a randomly arranged array, can be done in O(N).
This step involves swapping the leftmost value in the array with the rightmost value in the array occupied by the heap and reheapification of the new smaller heap.
Swapping the max element with the bottom level rightmost element and reducing the size of the heap can be done in constant time, O(1).
Now, let’s discuss reheapification. In the worst case, the new value at the root position will have to be swapped log(N) times to be sent to the bottom of the heap to achieve a MaxHeap once again. So each reheapification after the extraction costs O(logN).
We will be performing this extraction N times, so the total time complexity of getting a sorted array out of a MaxHeap is O(N*log(N)).
The total time complexity of heap sort can be calculated as:
Time for creating a MaxHeap + Time for getting a sorted array out of a MaxHeap
=O(N) +O(Nlog(N))
=O(Nlog(N))
Heap sort’s space complexity is a constant O(1) due to its auxiliary storage.
Question 1: Does the heap data structure have to be binary-tree-based?
No, a heap does not always need to be a binary tree. But in heap sort, we use arrays to represent the heap. We can easily calculate and track the relationship between a parent index, its left child index, and right child index for a binary heap using the array. And a binary heap has to be binary-tree-based.
Question 2: Can heap sort be made stable?
While heap sort is typically not stable, it can be made stable by taking into account the position of the elements with the same value. During heapification, treat the element towards the right as greater than the element towards the left, and your sorting will be stable.
Question 3: Why are arrays used to visualize and implement binary heaps?
Storing and accessing values in an array is faster and less complicated than using a more complex data structure. One of the main advantages of using more complex data structures is the use of methods provided by the standard library for common operations related to the data structure, e.g., push() and pop() methods for a stack.
However, storing a complete binary tree in an array still allows us to perform all operations relevant to the tree with much ease. We can find the left child, right child, parent node, root, and the last element of a tree with basic arithmetic operations on the index of the current node or the variable maintaining the size of the tree.
Question 4: How much time does it take to find the maximum and minimum element in a max heap?
The maximum element is present at the root and can be found in O(1) time. The minimum element will be present in the leaf nodes, and all leaf nodes have to be checked to find the minimum element. Hence, the minimum element can be found in O(n) time.
Sorting algorithms interview questions feature in almost every coding interview for software developers. If you’re looking for guidance and help to nail these questions and more, sign up for our free webinar.
As pioneers in the field of technical interview prep, we have trained thousands of software engineers to crack the toughest coding interviews and land their dream jobs at Google, Facebook, Apple, Netflix, Amazon, and other Tier-1 tech companies.
----------
Article contributed by Tanya Shrivastava