Algorithm challenge. Find the running median of an array of integers.

This is a famous algorithm challenge, it’s also a very tough one. I’ve seen this problem so many times around and never took the time to actually solve it. Practicing this stuff is good to keep the mind sharp so this time I invested some time to solve it, actually more time than expected. I’m writing this to help future seekers of knowledge that will come by.
Here’s the challenge:

You are given a set of integers coming from user input. The first number you get is the amount of integer values in the set. The next numbers you get are the integers in the set. You have to find the running median of the set of integers. The running median is the midpoint value of the sorted set, or the average of the two mid points if the number of elements in the set is even.

So if your data set is [3, 2, 4, 1, 5] you first need to sort it into [1, 2, 3, 4, 5] and then find the element with such a position that an equal number of elements are present on both left and right of that position.
If the set contains an even amount of elements, such as [1, 2, 3, 4] then there is no such position, in this case the median is the average of the two elements that, when taken together, satisfy the property, in this case the median will be 2.5.

Naive Approach

What makes this challenge interesting is not how to find the median of a sorted array, everybody can do that. The real issue is how to solve the problem in the most efficient way when it comes to execution time, note that there is a sorting action that needs to be done for every element received. I’m going to provide here the worst possible approach and then explain why something better is needed.

The idea is straightforward, read a new element, insert the element in a List, sort the List and finally get the median out of it. But the idea is also highly inefficient, there is a call to Collections.sort() for every new element in the data set. This means that the time complexity of that code is something like N, number of elements in the set multiplied by Nlog(N) which is probably the complexity of Collections.sort(), or at least is the best possible complexity for the sorting.


Can this problem be solved with a faster algorithm? Yes.

A better approach using Heaps

To find the median we need to keep some sort of information about where the center of the data set is, possibly without sorting the set every time a new element is added. Introducing Heaps, a data structure that has an interesting property, the first element of a heap, called the head, is always the biggest or the smallest value in the whole heap. See it as a magical array that always has the smallest or biggest element at position 0. A min heap will always have the smallest element as head, a max heap will always have the biggest element as head. There is plenty of documentation on heaps out there so I won’t be explaining the internals here. Adding elements to a heap takes log(N) time, if we combine this property with our for loop that reads the elements from the user then we end up with a time complexity of N*log(N), which is a lot better than N*N*log(N).

The algorithm

We need two heaps, a left heap and a right heap.

  • The left heap is a max heap, it contains roughly half of the elements in the set, the half that falls to the left of the median. The left heap’s head is its biggest element.
  • The right heap is a min heap, it contain roughly half of the elements in the set, the half that falls to the right of the median. The right heap’s head is its smallest element.

The idea is to update left and right heap in such a way that we can always figure out the median value from the two heads. This is the algorithm step by step:

  1. When the first element is received, add it to the left heap.
    The median is the element.
    Wait for the next element.
  2. If the new element is bigger than the left heap’s head then add the new element to the right heap.
    Otherwise add the new element to the left heap.
    With this logic we try to split elements which are smaller than the median from those which are bigger. However there is a problem, consider the dataset [1,2,3,4,5], in this case our left heap would contain only 1 while the right heap would contain everything else because all the elements after 1 are bigger than 1. We need a way to keep the same amount of elements in both heaps. The next step addresses this.
  3. Take the head from the heap that has more elements and move it to the other heap. If the right heap has more elements then remove the head from the right heap and add it to the left heap. Otherwise, if left heap has more elements then remove the head from the left heap and add it to the right heap. Doing this at each step guarantees the balancing.
  4. At this point the two heaps are balanced, their sizes will differ by zero after receiving an even number of elements, and by one after receiving an odd number of elements. To find the median with an even number of elements we simply average the two heads. To find the median with an odd number of elements we take the head of the longest heap.

That’s our efficient algorithm, here is my implementation in java. It uses PriorityQueue which is a java implementation of heap.

Hope you liked the explanation, please let me know if something still sounds foggy so I can improve it.


Codility and the “K complementary pairs in array” challenge

Recently I had to confront myself with Codility, for those who don’t know it Codility is an online tool that provides coding challenges and it is often used by companies in the process of hiring new people when they want to test people’s coding skills.

But that’s not all, Codility also provides a great training section where you can confront yourself with a ton of coding challenges and improve your understanding of algorithms and your problem solving skills.

Yes, someone was testing my coding skills.

One of the tests I had to solve involved the concept on “K complementary pairs” in an array, basically what I was requested to do was something like this:

Given an integer array A find all the complementary pairs in A that sums up to a provided value K, so for example given the array [2, 5, -1, 6, 10, -2] and an integer K = 4 the result must be 5 because the input array contains [2, 2] [5, -1] [-1, 5] [6, -2] [-2, 6]

The solution is expected to run with a total time complexity of
O(N logN)

First approach that comes to mind, let’s write two nested for loops, sum up each element with each other and count how many times we get the input value K:

This works! Oh wait… it is not an O(N logN), this approach runs in quadratic time or something like that… shit.
This is the solution that I submitted however, because I had other tests to do and a limited amount of time so since I could not think of a better solution at the moment I just sent this one, better slower than nothing at all.

But then a few hours later I had the right idea, and I came up with an algorithm that solves the same problem in linear time.
The idea is the following:

Let’s use an HashMap to store the occurrences of each item in the input array, so each entry will be (key=item, value=occurrencies).

Now traverse the HashMap‘s keySet() and using the input value K find out which value we need in order to produce K starting from the current item.

At this point just multiply the occurrences of the current key by the number of the occurrences of the required value.

And that’s it, here’s the code.

Please note that this implementation still has a (very) worst case time complexity which is bound to the internal implementation of the HashMap, as Alexey suggested in the comment:

Hash map search operation has O(n) worst case complexity. So your implementation has still O(n^2) complexity (worst case)

Thanks Alexey for spotting this 🙂


Merge Sort algorithm in Java

This is a very simple Java implementation of the popular merge sort algorithm.
I decided to put this here because I needed to refresh my knowledge of sorting algorithms, problem is I could not find an easy-to-understand implementation online, so I decided to implement my own version and share it in case someone is struggling on understanding how to implement this algorithm. I think this is the shortest implementation ever.


Merge sort works by splitting the original array into halves several times until many subarrays of size 1 are obtained.
The algorithm then merges back every subarray in such a way that the result of merging two subarrays is a single, sorted array twice as big as the subarrays.
The process is repeated until an array of the same size of the input array is obtained.

Here is an example:

              [24, 3, 17, 109, 6, 72, 11, 0]                  Unsorted input

      [24, 3, 17, 109]              [6, 72, 11, 0]            Split #1

   [24, 3]       [17, 109]      [6, 72]        [11, 0]        Split #2

  [24]  [3]     [17]  [109]    [6]  [72]      [11]  [0]       Split #3

   [3, 24]       [17, 109]      [6, 72]        [0, 11]        Merge #2
       [3, 17, 24, 109]             [0, 6, 11, 72]            Merge #3
              [0, 3, 6, 11, 17, 24, 72, 109]                  Merge #4 : Sorted output

This is a very simple introduction to the algorithm, more details on Wikipedia.

Give me the code!

Here’s the code I wrote that implements this merge sort, it can be improved a lot especially on the space it requires to run, you’ll see that at every iteration two new subarrays are created, this is not optimal if we need to save memory.