About usWhy usInstructorsReviewsCostFAQContactBlogRegister for Webinar

Online Median Problem

Online Median

Given a stream of integers, find the median of the set of integers read from the stream so far. If the median is a non-integer, round it down to the nearest integer (Math.floor of the value). 


Median of an array is defined as the middle element in sorted form of the array if the number of elements is odd, otherwise as the mean of the middle two elements.


You need to complete the constructor of the class MedianFinder as well as function findMedian.


Constructor of MedianFinder class: You can use this to initialise the data structure to be used in findMedian function.

Function findMedian: Input of this function will be integer and output will be integer. You need to consider passed integer as part of the stream so far and calculate median and return it.


Input Format:

There is a single argument in input of the function findMedian which will be a single integer denoting the number from data stream.


Output Format:

Return an integer denoting the median of the stream so far.


Constraints:

● 1 <= n <= 10^5 where n denotes the number of integers in input stream.

● 0 <= num <= 10^9 where num is the integer from input stream.

● Stream can contain duplicate elements. 


Sample Input:

stream = [3, 8, 5, 2]


Sample Output:

result = [3, 5, 5, 4]


As for given stream [3, 8, 5, 2], function findMedian will be called with each value of stream starting from index 0.

Initially data_so_far is Empty.

Add 3 → data_so_far: [3] → Median is 3.

Add 8 → data_so_far: [3, 8] → Median is (3+8)/2 = 5

Add 5 → data_so_far: [3, 8, 5] → Median is 5 as sorted array will be [3, 5, 8]

Add 2 → data_so_far: [3, 8, 5, 2] → Median is (3+5)/2 = 4 as sorted array will be [2, 3, 5, 8]


Solution

We have provided solutions which contain necessary comments to understand the approaches used:

optimal_solution.java


Description:

The idea is to use a min heap (called minHeap) and a max heap (called maxHeap). The minHeap stores the larger half of the stream and the maxHeap stores the lower half of the sorted stream. 

For every element that is added from the stream, we keep the sizes of the heaps the same or they differ maximum by a difference of 1. Without the loss of generality, we will have the extra element in minHeap whenever required. This way, if the total size of the stream is odd, the element on top of minHeap is our median else the floor of the average of the elements on the top of minHeap and maxHeap is our required value.


While adding elements to the heaps, few conditions might arise which are listed as follows:

● If the value is smaller than the top element of maxHeap, and the size of maxHeap is smaller than that of minHeap. Add the element to the maxHeap. Vice versa, if the value is larger than the top element in minHeap and the size of both the heaps are same, add the element to the minHeap

● If the value is bigger than the top value of maxHeap and smaller than top value of minHeap, if the sizes are the same, add value to the minHeap else to the maxHeap.

● If the value is bigger than the top value of maxHeap and minHeap both, if the minHeap does not have the extra element, add value to minHeap, else add the top value from minHeap to maxHeap and then add the new integer from stream to minHeap. Similarly, add the top value from maxHeap to minHeap, if the element is smaller than both the top values of the heaps and also if the sizes are the same.


Please have a look at the solution for better understanding.

Time Complexity:

O(n*logn) where n is the number of elements in the stream.


In function findMedian, we are either adding element in minHeap/maxHeap or removing of top element from minHeap/maxHeap with some more constant time function. Adding of element in minHeap/maxHeap will take O(log n) time where n is the number of elements present in heap till now and removal of top element from heap will take O(1) time.

As we are calling function findMedian n times, it will take O(n*log n) time. Hence the total time complexity of the solution will be O(n*logn).


Auxiliary Space Used:

O(n) where n is the number of elements in the stream.

The minHeap and maxHeap both are storing n/2 elements hence requires O(n/2) auxiliary space. So, total auxiliary space complexity is O(n).


Space Complexity:

O(n) where n is the number of elements in the stream.
To store input having n elements in stream will take O(n) space, auxiliary space used is O(n) and to store output it will take O(n) space hence total space complexity will be O(n).


// -------- START --------
class MedianFinder {

    PriorityQueue minHeap;
    PriorityQueue maxHeap;

    /** initialize your data structure here. */
    public MedianFinder() {
        // MinHeap is to store elements greater than equal to current median
        minHeap = new PriorityQueue<>();
        // MaxHeap is to store elements smaller than equal to current median
        maxHeap = new PriorityQueue<>((o1, o2) -> Integer.compare(o2, o1));
    }
    
    public int findMedian(int num) {
        // Add element in minHeap if minHeap is empty
        // Example: minHeap:[], num=2
        if (minHeap.isEmpty()) {
            minHeap.add(num);
            return num;
        }

        if (maxHeap.isEmpty()) {

            // If number is less than top element of minHeap
            // Example: minHeap:[2], maxHeap:[], num=1
            if (num <= minHeap.peek()) {
                maxHeap.add(num);
                return (maxHeap.peek() + minHeap.peek()) / 2;
            }

            // Example: minHeap:[2], maxHeap:[], num=4
            // Remove top element of minHeap and add that in maxHeap and add num in minHeap
            maxHeap.add(minHeap.poll());
            minHeap.add(num);
            return (maxHeap.peek() + minHeap.peek()) / 2;
        }

        // If number is less than top element of minHeap and top element of maxHeap
        if (num <= minHeap.peek() && num <= maxHeap.peek()) {

            // If size of minHeap is greater than maxHeap
            // Example: minHeap:[4, 5], maxHeap:[2], num=1
            if (minHeap.size() > maxHeap.size()) {
                maxHeap.add(num);
                return (maxHeap.peek() + minHeap.peek()) / 2;
            }

            // Example: minHeap:[4, 5], maxHeap:[2, 1], num=0
            // Remove top element of maxHeap and add that in minHeap and add num in maxHeap
            minHeap.add(maxHeap.poll());
            maxHeap.add(num);
            return minHeap.peek();
        }

        // If number is greater than top element of minHeap and top element of maxHeap
        if (num > minHeap.peek() && num > maxHeap.peek()) {

            // If size of minHeap and maxHeap are same
            // Example: minHeap:[4, 5], maxHeap:[2, 1], num=6
            if (minHeap.size() == maxHeap.size()) {
                minHeap.add(num);
                return minHeap.peek();
            }

            // Example: minHeap:[4, 5], maxHeap:[2], num=6
            // Remove top element of minHeap and add that in maxHeap and add num in minHeap
            maxHeap.add(minHeap.poll());
            minHeap.add(num);
            return (maxHeap.peek() + minHeap.peek()) / 2;
        }

        // If size of minHeap and maxHeap is same then add in minHeap and return top of minHeap
        if (minHeap.size() == maxHeap.size()) {
            minHeap.add(num);
            return minHeap.peek();
        }

        maxHeap.add(num);
        return (maxHeap.peek() + minHeap.peek()) / 2;
    }
}
// -------- END --------