Algorithm challenge. Find the running median of an array of integers.

This is a famous algorithm challenge, it’s also a very tough one. I’ve seen this problem so many times around and never took the time to actually solve it. Practicing this stuff is good to keep the mind sharp so this time I invested some time to solve it, actually more time than expected. I’m writing this to help future seekers of knowledge that will come by.
Here’s the challenge:

You are given a set of integers coming from user input. The first number you get is the amount of integer values in the set. The next numbers you get are the integers in the set. You have to find the running median of the set of integers. The running median is the midpoint value of the sorted set, or the average of the two mid points if the number of elements in the set is even.

So if your data set is [3, 2, 4, 1, 5] you first need to sort it into [1, 2, 3, 4, 5] and then find the element with such a position that an equal number of elements are present on both left and right of that position.
If the set contains an even amount of elements, such as [1, 2, 3, 4] then there is no such position, in this case the median is the average of the two elements that, when taken together, satisfy the property, in this case the median will be 2.5.

Naive Approach

What makes this challenge interesting is not how to find the median of a sorted array, everybody can do that. The real issue is how to solve the problem in the most efficient way when it comes to execution time, note that there is a sorting action that needs to be done for every element received. I’m going to provide here the worst possible approach and then explain why something better is needed.

The idea is straightforward, read a new element, insert the element in a List, sort the List and finally get the median out of it. But the idea is also highly inefficient, there is a call to Collections.sort() for every new element in the data set. This means that the time complexity of that code is something like N, number of elements in the set multiplied by Nlog(N) which is probably the complexity of Collections.sort(), or at least is the best possible complexity for the sorting.


Can this problem be solved with a faster algorithm? Yes.

A better approach using Heaps

To find the median we need to keep some sort of information about where the center of the data set is, possibly without sorting the set every time a new element is added. Introducing Heaps, a data structure that has an interesting property, the first element of a heap, called the head, is always the biggest or the smallest value in the whole heap. See it as a magical array that always has the smallest or biggest element at position 0. A min heap will always have the smallest element as head, a max heap will always have the biggest element as head. There is plenty of documentation on heaps out there so I won’t be explaining the internals here. Adding elements to a heap takes log(N) time, if we combine this property with our for loop that reads the elements from the user then we end up with a time complexity of N*log(N), which is a lot better than N*N*log(N).

The algorithm

We need two heaps, a left heap and a right heap.

  • The left heap is a max heap, it contains roughly half of the elements in the set, the half that falls to the left of the median. The left heap’s head is its biggest element.
  • The right heap is a min heap, it contain roughly half of the elements in the set, the half that falls to the right of the median. The right heap’s head is its smallest element.

The idea is to update left and right heap in such a way that we can always figure out the median value from the two heads. This is the algorithm step by step:

  1. When the first element is received, add it to the left heap.
    The median is the element.
    Wait for the next element.
  2. If the new element is bigger than the left heap’s head then add the new element to the right heap.
    Otherwise add the new element to the left heap.
    With this logic we try to split elements which are smaller than the median from those which are bigger. However there is a problem, consider the dataset [1,2,3,4,5], in this case our left heap would contain only 1 while the right heap would contain everything else because all the elements after 1 are bigger than 1. We need a way to keep the same amount of elements in both heaps. The next step addresses this.
  3. Take the head from the heap that has more elements and move it to the other heap. If the right heap has more elements then remove the head from the right heap and add it to the left heap. Otherwise, if left heap has more elements then remove the head from the left heap and add it to the right heap. Doing this at each step guarantees the balancing.
  4. At this point the two heaps are balanced, their sizes will differ by zero after receiving an even number of elements, and by one after receiving an odd number of elements. To find the median with an even number of elements we simply average the two heads. To find the median with an odd number of elements we take the head of the longest heap.

That’s our efficient algorithm, here is my implementation in java. It uses PriorityQueue which is a java implementation of heap.

Hope you liked the explanation, please let me know if something still sounds foggy so I can improve it.


Codility and the “K complementary pairs in array” challenge

Recently I had to confront myself with Codility, for those who don’t know it Codility is an online tool that provides coding challenges and it is often used by companies in the process of hiring new people when they want to test people’s coding skills.

But that’s not all, Codility also provides a great training section where you can confront yourself with a ton of coding challenges and improve your understanding of algorithms and your problem solving skills.

Yes, someone was testing my coding skills.

One of the tests I had to solve involved the concept on “K complementary pairs” in an array, basically what I was requested to do was something like this:

Given an integer array A find all the complementary pairs in A that sums up to a provided value K, so for example given the array [2, 5, -1, 6, 10, -2] and an integer K = 4 the result must be 5 because the input array contains [2, 2] [5, -1] [-1, 5] [6, -2] [-2, 6]

The solution is expected to run with a total time complexity of
O(N logN)

First approach that comes to mind, let’s write two nested for loops, sum up each element with each other and count how many times we get the input value K:

This works! Oh wait… it is not an O(N logN), this approach runs in quadratic time or something like that… shit.
This is the solution that I submitted however, because I had other tests to do and a limited amount of time so since I could not think of a better solution at the moment I just sent this one, better slower than nothing at all.

But then a few hours later I had the right idea, and I came up with an algorithm that solves the same problem in linear time.
The idea is the following:

Let’s use an HashMap to store the occurrences of each item in the input array, so each entry will be (key=item, value=occurrencies).

Now traverse the HashMap‘s keySet() and using the input value K find out which value we need in order to produce K starting from the current item.

At this point just multiply the occurrences of the current key by the number of the occurrences of the required value.

And that’s it, here’s the code.

Please note that this implementation still has a (very) worst case time complexity which is bound to the internal implementation of the HashMap, as Alexey suggested in the comment:

Hash map search operation has O(n) worst case complexity. So your implementation has still O(n^2) complexity (worst case)

Thanks Alexey for spotting this ­čÖé


Merge Sort algorithm in Java

This is a very simple Java implementation of the popular merge sort algorithm.
I decided to put this here because I needed to refresh my knowledge of sorting algorithms, problem is I could not find an easy-to-understand implementation online, so I decided to implement my own version and share it in case someone is struggling on understanding how to implement this algorithm. I think this is the shortest implementation ever.


Merge sort works by splitting the original array into halves several times until many subarrays of size 1 are obtained.
The algorithm then merges back every subarray in such a way that the result of merging two subarrays is a single, sorted array twice as big as the subarrays.
The process is repeated until an array of the same size of the input array is obtained.

Here is an example:

              [24, 3, 17, 109, 6, 72, 11, 0]                  Unsorted input

      [24, 3, 17, 109]              [6, 72, 11, 0]            Split #1

   [24, 3]       [17, 109]      [6, 72]        [11, 0]        Split #2

  [24]  [3]     [17]  [109]    [6]  [72]      [11]  [0]       Split #3

   [3, 24]       [17, 109]      [6, 72]        [0, 11]        Merge #2
       [3, 17, 24, 109]             [0, 6, 11, 72]            Merge #3
              [0, 3, 6, 11, 17, 24, 72, 109]                  Merge #4 : Sorted output

This is a very simple introduction to the algorithm, more details on Wikipedia.

Give me the code!

Here’s the code I wrote that implements this merge sort, it can be improved a lot especially on the space it requires to run, you’ll see that at every iteration two new subarrays are created, this is not optimal if we need to save memory.


How to get the SHA1 hash sum of a String in Android

Here’s one piece of code that comes very handy in some situations, just copy and paste this to get the SHA1 hash sum from a text String. The Hash is returned as an hex encoded human readable String, see below for an example of use:

Here’s how to use this:

The code above will print this output:

String: Hello World
SHA1 HASH: 0a4d55a8d778e5022fab701977c5d840bbc486d0

Enjoy ­čśë


Never synchronize on Java boxed primitives

This is another pitfall that can easily compromise a whole application if not understood, it involves thread safety, Java objects immutability and boxed primitives.
Standard Java classes like Integer and Long are immutable. A class is said to be immutable if it is implemented in such a way that it’s value cannot be changed once the class has been instantiated.

Integer value = 10;

The Integer object declared above is immutable, it’s value can not be changed.
But wait a second, you say, what if I do this?


Oh, I say, you changed the object’s value with the increment operator!
Actually this is not what happened, when it comes to running this code the statement above is pretty much equivalent to this one:

value = new Integer(value + 1);

Since this class is immutable, as soon as you change it’s value a new Integer instance with the new value is created and assigned to value. This is what we call immutability and applies also to String, Boolean, Long, Double, BigInteger and so on. This is the first step in understanding what follows.
The second step is understanding boxed primitives, but you may already know what they are.

Boxed primitives can be really useful sometimes, and can be very tricky too. You can use them to represent values such as Integer, Long, Boolean, Double and so on. They are called boxed primitives, or wrapped primitives, because they wrap a primitive Java type such as int and long inside a class allowing a more flexible approach to a strong object oriented environment such as Java.

If you work in a multi threading environment you’ll often need to synchronize parts of your code to make sure that no thread except yours operates on an object that you are working on. Let’s say you have an Integer value that can be incremented by one thread and decremented by another thread, your code may look like this:

private Integer value = 0;
synchronized(value) {

But this innocent looking piece of code hides an horrible monster, can you spot it?
The synchronized block is synchronized against the value object which will no more be itself as soon as we increment it, let’s see it under a different light using what we learned about immutability:

private Integer value = 0;
synchronized(value) {
  value = new Integer(value + 1);

The synchronized block works by locking the object on which we want to synchronize, but the object will change inside the block! The value reference will point to another Integer object after the increment!
The new Integer object with value 11 won’t be locked and as a result if some other thread tries to synchronize on the value object after our increment operation it will succeed.
As a result we’ll have two or more threads running at the same time even if they are supposed to be waiting to execute critical code sections one at a time.
This is dangerous as hell, this is the kind of bug that starts showing up on release day..


How not to compare Strings in Java

In my previous post I wrote about how not to compare Integer and Long objects in Java in order to avoid undefined behaviours in your code caused by the Java interning mechanism. If not well understood interning can lead to catastrophic failures. You may think that your code is working fine while in fact it is not working at all.

Interning is used for String objects too so it’s important to know how the Java virtual machine can fool you. Let’s have a look at this code:

public static void main(String[] args) {
  String a = "test01";
  String b = "test01";
  System.out.println("Comparing \"" + a + "\" and " + "\"" + b + "\"");

  if (a == b) System.out.println("Matches");
  else System.out.println("Doesn't match");

Every Java developer knows that String objects cannot be compared with the == operator, because it only compares object instances and not objects values, the equals() method must be used instead. The code above creates two different String objects with the same value and then compares them with the == operator, thus the comparison should fail even if the values are the same.
However if you run it you’ll notice that it works perfectly, the output is:

Comparing "test01" and "test01"

This is the tricky part. You may now be fooled into believing that your code works fine while in fact it does not.

Interning is at work here, just like for Integers and Longs in the previous post. In Java all literal strings and string-valued constant expressions are interned. This means that both a and b in the code above are referencing the same instance of String that was created and interned when a was created with a literal value.

String a = "test01";
/** A String instance with value "test01" is now interned */

The code above will stop working as soon as it gets a little more complex:

public static void main(String[] args) {
  String a = "test01";
  String b = "test";
  b += "01";
  System.out.println("Comparing \"" + a + "\" and " + "\"" + b + "\"");
  if (a == b) System.out.println("Matches");
  else System.out.println("Doesn't match");

This time two String instances with values “test01” and “test” are interned, and when we append “01” to b a completely new String object is created (because of objects immutability) with value “test01” which will not refer to the interned one with the same value.
The output now is:

Comparing "test01" and "test01"
Doesn't match

Always use the equals() method to compare objects values in Java.