About usWhy usInstructorsReviewsCostFAQContactBlogGet Started

Count Basins Problem

Given the altitudes of the regions on a surface, determine the basins where water would collect if poured onto that surface.

Region whose four neighbors (right, left, up and down) are all higher in altitude is called a sink. All the water would collect in sinks. If a region is not a sink, it is guaranteed to have a single lowest neighbor where water will drain to. All regions that drain to a particular sink–directly or indirectly–collectively form one basin. Partition the surface into the basins and return their sizes in the non-decreasing order.


Example One

Input:

[[1, 5, 2],

 [2, 4, 7],

 [3, 6, 9]]

Output: [2, 7]

There are two basins, one consists of two cells and the other consists of seven. They are labeled with A’s and B’s here:

 B B A

 B B A

 B B B

The sink of basin A is cell (0, 2). The sink of basin B is cell (0, 0).


Example Two

Input:

[[0, 2, 1, 3],

 [2, 1, 0, 4],

 [3, 3, 3, 3],

 [5, 5, 2, 1]]

Output: [4, 5, 7]

There are three basins. They are labeled with A, B and C here: 

B B C C

B C C C

B C C A

B A A A

The sinks of basins A, B and C are (3, 3), (0, 0) and (1, 2) respectively.


Notes

Input Parameters: The function has one argument, a two-dimensional array of integers representing the altitudes of the regions of a rectangular surface.

Output: Return an array of integers representing the sizes of basins in the non-decreasing order.


Constraints:

● 1 <= total number of regions on the given surface <= 1000000

● 0 <= altitude of any region on the surface <= 1000000


 Solution

Three solutions that we provided use different techniques to mark the basins, but in the end they all use the Counting Sort algorithm (see IK Sorting class or this foundational video) to sort the basin sizes in linear time.

For convenience, let’s define n as the total number of regions on the given surface. In other words, n is the product of the rows * columns of the given matrix.


1) brute_force_solution.cpp

In this naive approach we assign each cell a unique ID; we allocate a new matrix called “basins” to store those IDs. We then iterate over the matrix and change the ID of each cell to the ID of the cell where it drains.

Consider this input matrix for example:

1 5 2

2 4 7

3 6 9

We start by creating a new matrix to store IDs and assign unique ID to each cell:

0 1 2

3 4 5

6 7 8

Then for each cell we change its ID as follows

(0, 0) is a sink so its ID doesn’t change.

(0, 1) drains to its lowest neighbor (0, 0) so we set its ID to 0.

(1, 1) drains to (0, 1) which in turn drains to (0,0) so we set the ID of (1, 1) to 0.

We repeat the process for all cells and arrive at this:

0 0 1

0 0 1

0 0 0

Now we just count the total number of occurrences of remaining IDs and return those numbers in the non-ascending order.


Time Complexity:

O(n^2).

To assign an ID to any matrix cell, in the worst case we will traverse the whole matrix; that takes O(n^2) time.

The size of the output array is O(n); it takes O(n) time to sort it using the Counting Sort algorithm.

Summing up the two we get O(n^2) + O(n) = O(n^2).

Auxiliary Space Used:

O(n).

The matrix with the IDs takes O(n).

The recursion stack for the process of calculating the IDs takes O(n), too.

Finally, for the Counting Sort we use another array of size O(n).

O(n) + O(n) + O(n) = O(n).

Space Complexity:

O(n).

Input, auxiliary space used and the output each take O(n) space. Summing them up we get O(n).


2) greedy_solution.cpp

One observation to make is that the cell with the lowest altitude in the entire matrix will always be a sink. Another one is that all the neighbors of the lowest cell will drain to that cell.

We can use these observations to systematically group cells into basins.

Create a new matrix and initialize all cells with -1 (unmarked).

Now iterate over the cells from lowest to highest altitude.

If the current cell is unmarked, mark it with a new unique ID.

Mark all its unmarked neighbors with the same ID.

Finally, count how many times each unique ID occurs in the matrix and return those counts in the non-ascending order.

Consider this input matrix for example:

1 5 2

2 4 7

3 6 9

The matrix with IDs in the beginning:

-1 -1 -1

-1 -1 -1

-1 -1 -1

Start with the lowest altitude, that is 1 at (0, 0) and mark it with a new ID 0. Then change all its neighbors’ IDs to 0 too.

 0  0 -1

 0 -1 -1

-1 -1 -1

Next lowest regions are (0, 2) and (1, 0) with the altitude 2.

(0, 2) is unmarked, so we mark it with a new ID of 1. We skip the neighbor already marked and mark the remaining one with the same ID of 1.

 0  0  1

 0 -1  1

-1 -1 -1

(1, 0) is already marked so we change its unmarked neighbors’ IDs to be the same as it has.

 0  0  1

 0  0  1

 0 -1 -1

Continue this until all cells have been processed. At that point the matrix will look like this:

0 0 1

0 0 1

0 0 0

It has two unique basin IDs. We now count how many times each of them occurs, sort those counts in the non-descending order and return.


Time Complexity:

O(n*log(n)).

Populating the sorted map of altitudes takes O(n*log(n)) time.

Processing cells afterwards takes constant time per cell, so O(n) for all cells.

The number of basins is O(n). It takes O(n) time to sort their sizes using the Counting Sort algorithm.

Summing up everything we get O(n*log(n)) + O(n) + O(n) = O(n*log(n)).

Auxiliary Space Used:

O(n).

Keeping track of the IDs of the matrix cells takes O(n) space.

The sorted map of the altitudes takes O(n) space, too.

The data structure we use for the Counting Sort takes another O(n).

Summing up everything we get O(n) + O(n) + O(n) = O(n).

Space Complexity:

O(n).

Input, auxiliary space used and the output each take O(n) space. Summing them up we get O(n).


3) optimal_solution.cpp

This is an optimized version of the brute force approach. The difference is that here we mark all the cells we passed through while calculating the sink of a particular cell and ignoring any cell already processed.

● Matrix called “basins” is initialized with -1 (unmarked). It will store the basin IDs in the end.

● Iterate over the matrix cells, skip cells already marked. In the loop, call the recursive function get_sink on the current cell and mark the current cell with the ID returned by that function.

● Function get_sink recursively calls itself with the cell the current cell drains to–until we reach a sink or a cell that’s already marked (assigned an ID). It returns the ID of the deepest cell it reached in the recursion:

○ If it’s reached a marked cell, it returns its ID.

○ Else it must be a sink. Mark it with a new unique ID and return that ID.

Consider this input matrix for example:

1 5 2

2 4 7

3 6 9

Initialize the “basins” ID matrix:

-1 -1 -1

-1 -1 -1

-1 -1 -1

(0, 0) with altitude 1 happens to be a sink, so we mark it with 0.

(0, 1) with altitude 5 drains to (0, 0) which is a sink, so we mark it with that sink’s ID, that’s 0.

(0, 2) with altitude 2 happens to be a sink, so we mark it with a new unique ID, that would be 1.

After processing three cells of the first row, the “basins” matrix looks like this:

 0  0  1

-1 -1 -1

-1 -1 -1

We continue processing the rest of the cells like that and end up with “basins” matrix looking like this:

0 0 1

0 0 1

0 0 0

It has two unique basin IDs. We now count how many times each of them occurs, sort those counts in the non-descending order and return.


Time Complexity:

O(n).

Unlike the brute force algorithm, here we won’t process any cell more than once because we mark the cells already processed; and it still takes constant time to process any one cell. Therefore the main processing loop now takes O(n) time.

Counting Sort still takes O(n) time, too.

O(n) + O(n) = O(n).

Auxiliary Space Used:

O(n), same as in the brute force approach.

The matrix with the IDs takes O(n) space.

The recursion stack for the process of calculating the IDs takes O(n), too.

Finally, for the Counting Sort we use another array of size O(n).

O(n) + O(n) + O(n) = O(n).

Space Complexity:

O(n).

Input, auxiliary space used and the output each take O(n) space. Summing them up we get O(n).