0 comments on “HackerRank “Almost Sorted” challenge”

HackerRank “Almost Sorted” challenge

I received an email from HackerRank a few days ago, “Can you solve the ‘Almost sorted’ challenge?”. Well, I thought, probably.. let me see.
The problem is about figuring out how to complete the sorting of an array of integers, we receive an input like the following:

1, 2, 8, 4, 5, 6, 7, 3, 9

The goal is to figure out how the input array can be transformed into a sorted array, using only one of two allowed operations:

  1. Swap two elements
  2. Reverse a continuous subsequence of the array

The challenge requires to print out the operation that can complete the sorting. If two elements can be swapped then we have to print swap l r where l and r are the positions of the two items to be swapped. If, on the other hand, a subsequence can be reversed to complete the sorting, then we print reverse s e where s an e are the initial and final positions of the subsequence. When printing the positions we have to use 1 based numbers.
Using the array above we should therefore be able to print swap 3 8

At this point I should say that the email came on a Friday evening, and for some reason I am usually not in my best programming mood on a Friday evening, especially in Amsterdam. However I tried to think through how to do this while drinking my (~third) beer, it didn’t work out. So I tried again the day after and I solved it in 10 minutes, because the solution is actually pretty straightforward, once you look at the problem from the right (sober) perspective.

If you struggle with this challenge I suggest you go check the quick sort algorithm again, because the idea here is kind of similar. What we have to do is start looking at the array from both ends, just keep two pointers l and r, then keep moving l and r towards the center of the array until an element is found which should be moved in order to sort the array. We have to find two such elements, otherwise the array is either already sorted or cannot be sorted using the allowed operations, consider this array:

1, 2, 3, 12, 5, 6, 7, 8, 9

In this case l and r will end up both at element 12 There are no items which can be swapped in order to sort the array, reversing is also not an option.

If on the other hand, we can find two distinct elements which can be swapped, then we have the first hint that we can potentially sort the array using one of the two allowed operations. What we could do at this point is swapping the two items, check if the array is sorted, and we would have solved the first part of the challenge, detecting swap operations.

But how about reverse? Well, reverse is just swap done more than once. We can detect reverse by keeping advancing the l and r pointers towards each other, swap the items if needed, that is to say, only if they need to be swapped, and count how many swaps we did. If the number of swaps is equal to half of the total elements that we walked through then we can sort the array using reverse. If the number of swaps is one, then we can sort the array using swap.

So that’s the basic idea, find the Python 3 implementation below, I hope this post helped those of you who got stuck at this, as you can see it’s easier than it seems 🙂

Advertisements
0 comments on “Algorithm challenge. Find the running median of an array of integers.”

Algorithm challenge. Find the running median of an array of integers.

This is a famous algorithm challenge, it’s also a very tough one. I’ve seen this problem so many times around and never took the time to actually solve it. Practicing this stuff is good to keep the mind sharp so this time I invested some time to solve it, actually more time than expected. I’m writing this to help future seekers of knowledge that will come by.
Here’s the challenge:

You are given a set of integers coming from user input. The first number you get is the amount of integer values in the set. The next numbers you get are the integers in the set. You have to find the running median of the set of integers. The running median is the midpoint value of the sorted set, or the average of the two mid points if the number of elements in the set is even.

So if your data set is [3, 2, 4, 1, 5] you first need to sort it into [1, 2, 3, 4, 5] and then find the element with such a position that an equal number of elements are present on both left and right of that position.
If the set contains an even amount of elements, such as [1, 2, 3, 4] then there is no such position, in this case the median is the average of the two elements that, when taken together, satisfy the property, in this case the median will be 2.5.

Naive Approach

What makes this challenge interesting is not how to find the median of a sorted array, everybody can do that. The real issue is how to solve the problem in the most efficient way when it comes to execution time, note that there is a sorting action that needs to be done for every element received. I’m going to provide here the worst possible approach and then explain why something better is needed.

The idea is straightforward, read a new element, insert the element in a List, sort the List and finally get the median out of it. But the idea is also highly inefficient, there is a call to Collections.sort() for every new element in the data set. This means that the time complexity of that code is something like N, number of elements in the set multiplied by Nlog(N) which is probably the complexity of Collections.sort(), or at least is the best possible complexity for the sorting.

O(N*N*log(N))

Can this problem be solved with a faster algorithm? Yes.

A better approach using Heaps

To find the median we need to keep some sort of information about where the center of the data set is, possibly without sorting the set every time a new element is added. Introducing Heaps, a data structure that has an interesting property, the first element of a heap, called the head, is always the biggest or the smallest value in the whole heap. See it as a magical array that always has the smallest or biggest element at position 0. A min heap will always have the smallest element as head, a max heap will always have the biggest element as head. There is plenty of documentation on heaps out there so I won’t be explaining the internals here. Adding elements to a heap takes log(N) time, if we combine this property with our for loop that reads the elements from the user then we end up with a time complexity of N*log(N), which is a lot better than N*N*log(N).

The algorithm

We need two heaps, a left heap and a right heap.

  • The left heap is a max heap, it contains roughly half of the elements in the set, the half that falls to the left of the median. The left heap’s head is its biggest element.
  • The right heap is a min heap, it contain roughly half of the elements in the set, the half that falls to the right of the median. The right heap’s head is its smallest element.

The idea is to update left and right heap in such a way that we can always figure out the median value from the two heads. This is the algorithm step by step:

  1. When the first element is received, add it to the left heap.
    The median is the element.
    Wait for the next element.
  2. If the new element is bigger than the left heap’s head then add the new element to the right heap.
    Otherwise add the new element to the left heap.
    With this logic we try to split elements which are smaller than the median from those which are bigger. However there is a problem, consider the dataset [1,2,3,4,5], in this case our left heap would contain only 1 while the right heap would contain everything else because all the elements after 1 are bigger than 1. We need a way to keep the same amount of elements in both heaps. The next step addresses this.
  3. Take the head from the heap that has more elements and move it to the other heap. If the right heap has more elements then remove the head from the right heap and add it to the left heap. Otherwise, if left heap has more elements then remove the head from the left heap and add it to the right heap. Doing this at each step guarantees the balancing.
  4. At this point the two heaps are balanced, their sizes will differ by zero after receiving an even number of elements, and by one after receiving an odd number of elements. To find the median with an even number of elements we simply average the two heads. To find the median with an odd number of elements we take the head of the longest heap.

That’s our efficient algorithm, here is my implementation in java. It uses PriorityQueue which is a java implementation of heap.

Hope you liked the explanation, please let me know if something still sounds foggy so I can improve it.

5 comments on “Codility and the “K complementary pairs in array” challenge”

Codility and the “K complementary pairs in array” challenge

Recently I had to confront myself with Codility, for those who don’t know it Codility is an online tool that provides coding challenges and it is often used by companies in the process of hiring new people when they want to test people’s coding skills.

But that’s not all, Codility also provides a great training section where you can confront yourself with a ton of coding challenges and improve your understanding of algorithms and your problem solving skills.

Yes, someone was testing my coding skills.

One of the tests I had to solve involved the concept on “K complementary pairs” in an array, basically what I was requested to do was something like this:

Given an integer array A find all the complementary pairs in A that sums up to a provided value K, so for example given the array [2, 5, -1, 6, 10, -2] and an integer K = 4 the result must be 5 because the input array contains [2, 2] [5, -1] [-1, 5] [6, -2] [-2, 6]

The solution is expected to run with a total time complexity of
O(N logN)

First approach that comes to mind, let’s write two nested for loops, sum up each element with each other and count how many times we get the input value K:

This works! Oh wait… it is not an O(N logN), this approach runs in quadratic time or something like that… shit.
This is the solution that I submitted however, because I had other tests to do and a limited amount of time so since I could not think of a better solution at the moment I just sent this one, better slower than nothing at all.

But then a few hours later I had the right idea, and I came up with an algorithm that solves the same problem in linear time.
The idea is the following:

Let’s use an HashMap to store the occurrences of each item in the input array, so each entry will be (key=item, value=occurrencies).

Now traverse the HashMap‘s keySet() and using the input value K find out which value we need in order to produce K starting from the current item.

At this point just multiply the occurrences of the current key by the number of the occurrences of the required value.

And that’s it, here’s the code.

Please note that this implementation still has a (very) worst case time complexity which is bound to the internal implementation of the HashMap, as Alexey suggested in the comment:

Hash map search operation has O(n) worst case complexity. So your implementation has still O(n^2) complexity (worst case)

Thanks Alexey for spotting this 🙂

1 comment on “Merge Sort algorithm in Java”

Merge Sort algorithm in Java

This is a very simple Java implementation of the popular merge sort algorithm.
I decided to put this here because I needed to refresh my knowledge of sorting algorithms, problem is I could not find an easy-to-understand implementation online, so I decided to implement my own version and share it in case someone is struggling on understanding how to implement this algorithm. I think this is the shortest implementation ever.

Introduction

Merge sort works by splitting the original array into halves several times until many subarrays of size 1 are obtained.
The algorithm then merges back every subarray in such a way that the result of merging two subarrays is a single, sorted array twice as big as the subarrays.
The process is repeated until an array of the same size of the input array is obtained.

Here is an example:

             
              [24, 3, 17, 109, 6, 72, 11, 0]                  Unsorted input

      [24, 3, 17, 109]              [6, 72, 11, 0]            Split #1

   [24, 3]       [17, 109]      [6, 72]        [11, 0]        Split #2

  [24]  [3]     [17]  [109]    [6]  [72]      [11]  [0]       Split #3

   [3, 24]       [17, 109]      [6, 72]        [0, 11]        Merge #2
   
       [3, 17, 24, 109]             [0, 6, 11, 72]            Merge #3
           
              [0, 3, 6, 11, 17, 24, 72, 109]                  Merge #4 : Sorted output

This is a very simple introduction to the algorithm, more details on Wikipedia.

Give me the code!

Here’s the code I wrote that implements this merge sort, it can be improved a lot especially on the space it requires to run, you’ll see that at every iteration two new subarrays are created, this is not optimal if we need to save memory.