## Sorting algorithms: Technical interviews must-know

Sorting algorithms are a must-know for any technical interview, be it a startup or a FANG company like Facebook, Amazon. If you fail to explain the sorting algorithm you are using and why are you using it, you can kiss goodbye to your interview. However, the most important aspect of sorting algorithms is their complexity. We will be going through 5 most known algorithms and group them by their complexity. Another aspect that is good know is the stability of the sorting algorithm.

Stability does the sorting algorithm maintain the relative order of elements that are equal?

The first group is of polynomial complexity like O(n2) and second group is linear logarithmic complexity like (n log n). Algorithms like Selection Sort, Bubble Sort, and Insertion sort fall in the first group. Quicksort, Merge sort and heap sort they fall in the second group.

Worst-case complexity of quicksort is O(n2), however, that is limited to very specific cases like already sorted input, on all other cases, if the pivot is selected properly, it will give O(nlogn)

## Selection sort

The fundamental of this sort is: Find the minimum element in remaining unsorted array and put it at the current position. We start with index 0 and go all the way to the last index, find the minimum in the remaining array, and put it to the current index.

```  public static void selectionSort(int[] arr){
for (int i = 0; i < arr.length - 1; i++){
int index = i;
//Select minimum in the remaining array
for (int j = i + 1; j < arr.length; j++){
if (arr[j] < arr[index]){
index = j;
}
}
//Swap the smallest number and current index number
int smallestNumber = arr[index];
arr[index] = arr[i];
arr[i] = smallerNumber;
}
}
```

If you are wondering why complexity of selection sort is O(n2) even though inner loop goes from i+1 to end, I would suggest reading this post on calculation of time complexity in for loop. The space complexity is O(1).

## Bubble sort

The fundamental of Bubble sort is: Compare two adjacent indices, if they are out of order, put them in order. We start with index 0, compare it with every new neighbor till the end of the array. After each pass, we have sorted one element.

```   void bubbleSort(int arr[])
{
int n = arr.length;
for (int i=0; i < n; i++){
for (int j = 1; j < n-i; j++){
if (arr[j-1] > arr[j]){
// swap arr[j-1] and arr[j]
int temp = arr[j];
arr[j] = arr[j-1];
arr[j-1] = temp;
}
}
}
}
```

The time complexity is obviously O(n2) and the space complexity is O(1). Bubble sort is a stable sorting algorithm.

## Insertion sort

The third algorithm in the polynomial complexity class is insertion sort, fundamental of it is: Taken element and put it in its correct position in the array by shifting all the indices on right by 1. The idea to keep the part of the array sorted, examine all the indices, and put them at the correct position in the sorted part of the array.

```    void insertionSort(int arr[]){
int n = arr.length;
for (int i = 1; i < n; ++i) {
int key = arr[i];
int j = i - 1;

/* Move elements of arr[0..i-1], that are
greater than key, to one position ahead
of their current position */
while (j >= 0 && arr[j] > key) {
arr[j + 1] = arr[j];
j = j - 1;
}
arr[j + 1] = key;
}
}
```

Time complexity is O(n2) and space complexity is O(1). Insertion sort is stable sort as well.

## Quicksort

The entire idea of quicksort revolves around a pivot. Pivot is an element in input around which input is arranged in such a way that all elements on the left side are smaller and all elements on the right side are greater than the pivot. The question is how to find or select pivot and put it into the correct position.
To put this pivot at the correct position in input, start with the next element of pivot in input space, and find an element that is greater than the pivot. Let that be ith position.

At the same time, start from the end of the array and find an element that is smaller than pivot. Let it be jth position.

If i and j have not crossed each other i.e `i < j`, then swap element at ith and jth positions, and continue moving right on input to find element greater than pivot and moving left to find element smaller than pivot. Once i and j cross each other, swap pivot with element at jth position.
After this step, the pivot will be at its correct position and the array will be divided into two parts. All elements on the left side will be less than the pivot and all elements on the right side will be greater than the pivot.

Recursively apply quicksort on left and right subarray, until there is only one element to sort.

```    private void swap(int [] a, int i, int j){
int temp = a[i];
a[i] = a[j];
a[j] = temp;
}

private int partition(int [] a, int start, int end){
int pivot = a[start];
int i  = start;
int j  = end;

while(i < j){
while(i <= end && a[i] <= pivot) i++;
while(j >= start && a[j] > pivot) j--;

if(i < j) {
swap(a, i, j);
}
}

swap(a, start, j);
return j;
}

public void quickSort(int [] a, int start, int end){
if(start < end){
int p = partition(a, start, end);
quickSort(a,start, p-1);
quickSort(a, p+1, end);
}
}
```

Time complexity of the quicksort implementation depends on the selection of pivot. If pivot splits the original array into two equal parts (which is the intention), the complexity of quicksort is O(n log n) (Master Theorem).

The worst-case complexity of quick sort happens when array is sorted or reverse sorted. Array is partitioned into two subarrays, one with size 1 and other with size n-1. Subarray with n-1 elements, it again is divided into two subarrays of size 1 and n-2. In order to completely sort array, it will split for n-1 times and each time it requires to traverse n element to find the correct position of pivot. Hence overall complexity of quick sort comes out as O(n2).

What is space complexity of this algorithm? There is no apparent memory is used. However, recursive implementation internally puts stack frames on the stack for partitioned indices and function call return addresses and so on. In the worst case, there can be n stack frames, so, the worst-case complexity of quicksort will be O(n).

## Merge sort

Merge sort algorithm falls in divide and conquer class of algorithms where input space is divided into subproblems and these subproblems are then solved individually and combined together to solve the original problems.
A question arises is what is the smallest subproblem? The smallest subproblem is where input cannot be further divided, a subproblem with one item to sorted. Once the split is done to the smallest size, we start merging. Going from the bottom, start with two one-item subproblems, combine that to create a two-item subproblem, then combine two two-items subproblems to create a four-item subproblem. Go on till you get a problem with the original size.
In merge operation, scan both sorted arrays one by one element and based on the output of comparison, put one of the two items into output array, till both arrays are scanned.

```int merge(int[] a, int start, int mid, int end){

int i,j,k;
int temp[end-start+1];

i = start; j = mid+1; k=0;
/* Compare all elements of two sorted arrays and store
them in sorted array. */
while(i <= mid && j <= end){
if(a[i]< a[j]){
temp[k++]= a[i++];
}
else {
temp[k++]  = a[j++];
}
}
while(i <= mid){
temp[k++] = a[i++];
}
while(j <= end){
temp[k++] = a[j++];
}
//Copy the temporary array into original array
for(i=0; i<k; i++){
a[i+start] = temp[i];
}
}
int mergeSort(int a[], int start, int end ){

int mid  = (start + end)/2;
if(start<end){
//sort the left part
mergeSort(a, start, mid);
//sort right part
mergeSort(a, mid+1, end);

//merge them together
merge(a, start, mid, end);
}
}
```

For n elements in an array, there will be O(logn) such steps. Hence, the complexity of merge sort algorithm is O(nlogn) Refer Master theorem.

Every time we merge, two sub-arrays an auxiliary array is needed to temporarily store array elements and that incurs O(n) space complexity on merge sort algorithm.

## Heap Sort

This one depends on the concept of heaps, which is covered on this website here: Heaps Fundamental is that we put all the elements in a priority queue or heap, min-heap to be precise. Then take element one by one and add back into the array. Using Java this is one of the simplest sorting algorithms, given that we get an inbuilt implementation of the priority queue in java.

```    public void heapSort (int[] a){

PriorityQueue<Integer> pq = Arrays.stream(a)
.boxed()
.collect(Collectors.toCollection(PriorityQueue::new));

int i = 0;
while(!pq.isEmpty()){
a[i] = pq.poll();
i++;
}

Arrays.stream(a)
.forEach(e -> System.out.print(e + " "));
}
```

The time complexity of heap sort is O(long) and space complexity is O(n).

There is another class of sorting algorithms that sort input in linear time complexity, but those algorithms are dependent on some conditions to be present in the input, for example for Radix sort, the input must be an integer.

Prepare your sorting algorithm basics and understand time and space complexity and stability. It is very good to know the fundamentals of them, even if you are using the inbuild library or function to do the sorting.

Always account for complexity for sorting algorithm in your solution, most the time that dominates the complexity.

If you are preparing for an interview and need some help with preparation, please reach out to us or book a session with us to understand more.

## Arrays With Elements In The Range 1 to N

When an array has all its elements in the range of 1 to N ( where N is the length of the array ) we can use the indices to store the ordered state of the elements in the array. This ordered-state can in-turn be used to solve a variety of problems which we’ll explore soon. First, a very simple demonstration of this property.

Here is an array which has unique elements in the range of 1 to N.
Given array (A) : 5,3,1,4,2
Indices:                0,1,2,3,4

Sort in Linear Time

The first use-case of this unique property is being able to sort in O(N) time i.e. a special-case(all unique elements) of the Counting Sort. The crux of this sort is to check whether an element is at its corresponding index and swap it to its correct index if it’s not. Following is a demonstration of this logic:

Given array (A) : 5,3,1,4,2
Indices:                0,1,2,3,4

For each A[i] check if A[A[i] – 1] equals A[i] or not. If they are not equal then swap element at A[A[i] – 1] with A[i]. Basically the correct value for any index i is when A[i] contains i+1.

A[A[0] – 1] or A[5-1] orA[4] which is 2 and A[0] = 5. This means that A[A[i] – 1] is not equal to A[i] and hence not in its correct position. So we need to swap in order to put A[0] -> 5 to its correct position which is index 4 and A[0] will hold 4 after the swap. Similarly, we need to repeat this check & swap for all the elements.

What if we cancel-out the common terms and modify the check from  `A[i] != A[A[i] - 1]` to `i != A[i]-1` ?

Find The Missing Integer

A similar approach can help us find the smallest missing positive-integer in a given array. By smallest missing positive-integer, we just mean the smallest positive integer that does not exist in the given list of numbers. For example:

Given Array: `-2, 3, 0, 1, 3`
In the above case, the smallest missing positive integer is 2.

If we were to apply the usual sorting techniques and then scan the array for the smallest positive integer absent it would imply a time-complexity of O(NLog(N)) + O(N). We can definitely do better than this! At first glance, it seems that this problem does not lie within the unique property of elements being in the range of 1 to N since the numbers in the given array are well outside the range, but to solve this problem we still only need to figure out whether we have elements from 1 to N present in the given array or not.

How do we know whether the given array has elements from 1 to N? We can use the counting-sort discussed earlier to put each element in its “correct position”, i.e index 0 should hold 1, index 1 should hold 2 and so on. The smallest index that does not hold its correct element is the missing integer.

If we sort the given array using counting sort described above, we will get: `1, 0, 3, -2, 3`. And the smallest index `i` to not hold its correct value i.e. `i+1` will give us the answer to the smallest missing positive integer. In this case, that index is 1 since it does not hold 2, thus the smallest positive missing integer is 2.

Find The Duplicate Element

The third use-case of this property is to figure out the duplicate elements without using any extra space. We can iterate over the array A and mark the corresponding index of the encountered element as negative – unless it has already been marked negative! For example: if A[1] = 3 (or -3 ) then mark `A[ Abs[3] - 1]` as negative, this way whenever we encounter 3 (or -3) again in any of the A[i] we will know that the value 3 has been visited before since A[3-1] will be negative.

Given array (A) : 5,3,1,4,3
Indices:                0,1,2,3,4

When we encounter A[0] i.e. 5, we make A[5-1] i.e. A[4] negative, so the array becomes:
`5,3,1,4,-3`
Next, we encounter A[1] i.e. 3, we make A[3-1] i.e. A[2] negative, so the array becomes:
`5,3,-1,4,-3`
Next, we encounter A[2] i.e. -1, we make A[1-1] i.e. A[0] negative, so the array becomes:
`-5,3,-1,4,-3`
Next, we encounter A[3] i.e. 4, we make A[4-1] i.e. A[3] negative, so the array becomes:
`-5,3,-1,-4,-3`
Next, we encounter A[4] i.e. -3, we want to make A[3-1] i.e. A[2] negative, but in this case, A[2] is already negative thus we know that A[2] has been visited before! Which means `Abs(A[4])` i.e 3 is the duplicate element.

Here is a snippet to demonstrate the code for sorting an array in linear time as per the above approach. The exact same approach can be used to solve the other two applications i.e. Finding the Duplicate and Finding The Missing Integer.

```        int swap=0;

for(int i = 0; i < nums.length;){

if(nums[i] > 0 && nums[i] < nums.length) {

if(nums[nums[i]-1] != nums[i]){
swap = nums[i];
nums[i] = nums[nums[i] - 1];
nums[swap - 1] = swap;
}else{
i++;
}

}else{
i++;
}
}
```

If you are preparing for a technical interview in companies like Amazon, Facebook, etc and want help with preparation, please register for a coaching session with us.

# Inversions in array

Let A[0…n – 1] be an array of n distinct positive integers. If `i < j` and `A[i] > A[j]` then the pair (i, j) is called an inversion of A. Given n and an array A, find the number of inversions in array A. For example: First array has two inversions (2,1) and (5,1) where as second array has 3 inversions, (2,1), (4,1) and (4,3)

How many inversion can be in a sorted array? There is no inversion in sorted array and nC2 inversions in completely inverted array.

## Inversions in array : Thought process

What first thing which comes to mind? For each index i, check all j where j > i and see if A[j] < A[i]?
If A[j] is greater than current element A[i], increase inversion count. Implementation is given below.

```package com.company;

/**
* Created by sangar on 6.4.18.
*/
public class Inversions {
public static int findInversions(int[] a){
int count = 0;
for(int i=0; i<a.length; i++){
for(int j=i+1;  j<a.length; j++){
if(a[i] > a[j]) count++;
}
}
return count;
}

public static void main(String[] args) {
int a[] = new int[]{90, 20, 30, 40, 50};
int inversions = findInversions(a);
System.out.print("Inversions : " + inversions);
}
}
```

Worst case complexity of this method to find inversions in array is O(n2).

Can we use the information that a sorted array does not have any inversion? Let’s ask this question again with a tweak, how many inversions will there be if only one element is out of place for a completely sorted array? There will one. What if there are two elements out of place? Of course, there will be 2 inversion. Are we getting somewhere? Effectively, we have to count, how many swaps we need to completely sort array from it’s original state. If you noticed, the brute force algorithm, is nothing but selection sort, instead of swapping elements, count inversions.

What if we use any other sort algorithm like Merge sort algorithm? Complexity of merge sort is much better than selection sort, then it is possible that merge sort identifies inversions much efficiently than selection sort.
Let’s use merge step of merge sort algorithm with a bit modification. Divide part will remains the same. While merging two sorted subarrays, count number of times element on right part is put on result array before element on left side.
Every time `A[i]` is appended to the output of merge sort, no new inversions are encountered, since `A[i]` is smaller than everything left in array B.  If `B[j]` is appended to the output, then it is smaller than all the remaining items in A, we increase the number of count of inversions by the number of elements remaining in A.
Overall inversions will be inversion in left part + inversions of right part and inversion which are in merge step.

Let’s see how inversions in merge steps are counted and then we will see the overall algorithm.

Total number of inversions is 6.

Overall, algorithm looks like below.

## Algorithm to count inversions in array

First, let’s write an algorithm to count inversions in merge step. When we are at merge steps, there are two sorted arrays with us:  A and B

1. Initialize i as start position of A and j as start position of B. These pointers will reference to currently compared indices in both arrays.
2. While there are elements in both arrays, i.e. i < length(A) && j < length(B)
1.  If B[j] < A[i], all elements from i to length(A) are greater than B[j],
count += number of elements remaining in A. Increment j
2. Else increment i
3. Return count

Replace merge part of merge sort with this piece of algorithm and return sum of inversions in left + inversions in right + inversions in merge from function.

MergeSortAndCount(L):

1. If L has one element return 0
2. Divide L into A, B
1. inversionsLeft = MergeSortAndCount(A)
2. inversionRight = MergeSortAndCount(B)
3. inversionMerge = MergeAndCount(A,B)
3. return inversionLeft + inversionRight + inversionMerge

## Inversions in array implementation

```package com.company;

/**
* Created by sangar on 6.4.18.
*/
public class Inversions {
public  static int mergeAndCount(int[] a, int low, int mid, int high){
int count  =  0;
int[] temp = new int[high-low+1];

int i = low;
int j = mid+1;
int k = 0;
/*
There are elements on both side of array
*/
while(i<=mid && j<=high){

if(a[i] > a[j]){
//Number of elements remaining on left side.
count+= (mid - i + 1);
temp[k++] = a[j++];
}
else{
temp[k++] = a[i++];
}
}
while(i<=mid){
temp[k++] = a[i++];
}
while(j<=high) {
temp[k++] = a[j++];
}

for(i=low; i<k+low; i++){
a[i] = temp[i-low];
}

return count;
}

public static int countInversions(int[] a, int low, int high){
if(low >= high) return 0;

int mid = low + (high - low) / 2;

int inversionsLeft = countInversions(a, low, mid);
int inversionsRight = countInversions(a, mid+1, high);
int inversionsMerge = mergeAndCount(a, low, mid, high);

return inversionsLeft + inversionsRight + inversionsMerge;
}

public static void main(String[] args) {
int a[] = new int[]{90, 20, 30, 40, 50};
int inversions = countInversions(a, 0, a.length-1);
System.out.print("Inversions : " + inversions);
}
}
```

Complexity of finding inversions in arrays using merge sort method is  O(n log n).

References: Lecture 8

# Dutch National Flag Problem

Given an array with 0s,1s and 2s, sort array in increasing order. Another way to phrase this problem is sort balls with three different colors : red, blue and white, where balls of each color are placed together. This is typically know as Dutch national flag problem and algorithm to solve it is called Dutch national flag problem. Example:
``` A = [0,1,0,1,1,2,0,2,0,1] Output = [0,0,0,0,1,1,1,1,2,2] ```
``` A = [R,B,B,W,B,R,W,B,B] Output = [R,R,W,W,B,B,B,B] ```
This problem can be asked as design question, as let’s say you have to design a robot. All this robot does is : it see three empty buckets and a bucket full of colored balls (red, blue and white). Design an instruction set for this robot that it fill each empty buckets with one color. It’s the same problem as Dutch National Flag problem.

Count to sort an array of 0s,1s and 2s
We have already seen a similar problem before as Segregate 0s and 1s in an array. We explored how to count elements and re-write them again on to the array.

Let’s apply the same method for this problem. Take an array of three integers, index store corresponding count for that number. E.g `count[0]` stores count of 0 and `count[1]` stores count of 1. Scan through the array and count each element. At last, re-write those numbers back on to array starting from index 0, according to their count. For example, if there are 4 zeros, then starting from 0 indexes, write 4 zeros, followed by 1s and then by 2.

Complexity of counting method is `O(n)`, notice that we scan array twice, first time for counts and second time for writing on array.

## Dutch national flag problem : algorithm

1. Start with three pointers : `reader`, `low` and `high`.
2. `reader` and `low` are initialized as `0` and `high` is initialized as last element of array as `size-1`.
3. reader will be used to scan the array while low and high will be used to swap elements to their desired positions.
4. Starting with current position of reader, follow below steps, keep in mind we need 0 at start of array
1. If element at index `reader` is 0, swap element at `reader` with element at `low` and increment low and reader by 1.
2. If element at `reader` is 1, do not swap and increment reader by 1.
3. If element at `reader` is 2, swap element at `reader` with element at `high` and decrease high by 1

Actually, three pointers divide array into four parts.  Red, white, unknown and Blue. Every element is taken from unknown part and put into its right place. So all three other parts expand while unknown part shrinks.

Let’s take an example and see how dutch national flag algorithm works.

First step, initialize reader, low and high.

Follow the same step, check element at reader again, it’s 1, hence, just move reader by one.

Element at reader is now 2, swap element at reader with element at high and decrease high by 1.

Element at reader is now 2, swap element at reader with element at high and decrease high by 1.

Element at reader is now 2, swap element at reader with element at high and decrease high by 1.

Here, high becomes less than reader, we can stop as array is already sorted.

#### Dutch national flag problem implementation

```package com.company;

/**
* Created by sangar on 5.1.18.
*/
public class DutchNationalFlag {

public static void swap(int[] input, int i, int j){
int temp =  input[i];
input[i] = input[j];
input[j] = temp;
}

public static void dutchNationalFalgAlgorithm(int [] input){

//initialize all variables
int low = 0;
int high = input.length - 1;

/*
input always holds a permutation of the
original data with input(0..(lo-1)) =0,
untouched, and input((hi+1)..(input.length-1)) = 2
*/
/*When element at reader is 0, swap
element at reader with element at index
low and increment reader and low*/
low++;
}
/* if element at reader is just
}
/* If element at reader is 2, swap
element at reader with element at
high and decrease high by 1 */
high--;
}
else{
break;
}
}

}
public static void main(String[] args) {
int[] input = {2,2,1,1,0,0,0,1,1,2};

dutchNationalFalgAlgorithm(input);

for(int i=0; i<input.length; i++){
System.out.print(" " + input[i]);
}
}
}
```

Complexity of Dutch National Flag algorithm is `O(n)`, however, we scan the array only once.

Please share if you have some suggestions or comments. Sharing is caring.

## 3-way quicksort (Quick sort on non unique elements)

In the post on the Dutch National Flag Algorithm, we discussed how to sort an array in linear time by partitioning it in four parts, red, white, unknown and blue. Can we apply the same technique on quick sort.? As we all know, quick sort relies on the partitioning of input into two parts. What if we divide the input into three parts? Then it becomes a 3-way quicksort.

The 3-way partition variation of quick-sort has a slightly higher overhead compared to the standard 2-way partition version. Both have the same best, typical, and worst-case time bounds, but the 3-way version is highly adaptive in the very common case of sorting with many non-unique keys.

## Quicksort basics and limitation

Before going ahead with 3-way partition method, I would strongly recommend that you go through the usual quick sort algorithm: Quicksort algorithm in C

A big limitation of quicksort is that it has O(n2) worst-case running time. Some improvements have been suggested as given below:

• The cutoff to insertion sort. Switch to insertion sort for arrays with size less than a pre-decided limit. We follow quick sort and partition array. Once the size of the partitioned array goes lower than the limit, apply insertion sort on that array. Limit varies from system to system and typically it is between 5 to 15.
• Median-of-three partitioning. Use median of a small sample of items taken from the array as the partitioning item. Doing so will give a slightly better partition but at the cost of computing the median.

There is another optimization which is called as Entropy-optimal sorting or 3-way partition quick sort. This method is useful when the input array contains a lot of duplicate values which can be frequently seen in the real world. The idea is to divide the array in three parts rather than two. Let’s say P is the pivot index. The first part contains all the numbers which are less than P, the second part contains number equal to P and the last part contains numbers which are greater than P.

## 3-way quicksort algorithm

Some of the properties of the 3-way quicksort are:
It is not stable! Avoid using quicksort in cases where stability is essential.
It uses O(log(n)) extra space, why? Because of the recursion.
Worst case run time is as same as classic quicksort, O(n2), but typically O(nlog(n)) time
The best part is that it takes O(n) time when O(1) unique keys.

This algorithm partitions the array into three parts:
1. one with is less than pivot
2. equal to pivot
3. one with elements greater than the pivot

## 3-way quicksort Implementation

```
import java.util.Arrays;
import java.util.Stack;

/**
* Created by sangar on 18.9.18.
*/
public class StockSpan {
public static int[] stockSpan(int[] prices){

Stack<Integer> s = new Stack();
int[] span = new int[prices.length];

//Step 1. Initialization
span[0] = 1;
s.push(0);

for(int i=1; i<prices.length; i++){
//Find the price on stack which is greater than current day's price
while(!s.empty() && prices[i] > prices[s.peek()])
s.pop();

if(s.empty())
span[i] = i+1;
else
span[i] =  i - s.peek();

//Push current day onto top of stack
s.push(i);
}

return span;
}

public static void main(String args[]){
int prices[] = {100, 60, 70, 65, 80, 85, 45, 77, 56, 98, 200};
int[] span = stockSpan(prices);

Arrays.stream(span).forEach(System.out::println);

}
}

```

The complexity of 3-way quicksort is O(n2).

Please share if something is wrong or any other suggestion or query. We would love to hear what you have to say.

# Merge Sort Algorithm

We can classify sorting algorithms based on their complexity, selection sort, bubble, and insertion sort have the complexity of O(n2) while heap sort, merge sort and quick sort (with some assumptions) have a complexity of O(nlogn) and count and Radix sorts are linear O(n) algorithms. In this post, we will concentrate on the merge sort algorithm.

### Key points about merge sort algorithm

• class of algorithm : divide and conquer
• Complexity : Time O(n2)
• Space : O(n)
• Stable : Yes
• In place : NO

Merge sort algorithm falls in divide and conquer class of algorithms where input space is divided into subproblems and these subproblems are then solved individually and combined to solve the original problem. Below figure explains the divide and conquer paradigm.

In the merge sort algorithm, input space is divided into two subproblems till the time we achieve the smallest subproblem and then the smallest sub-problem is sorted and then combined up till the original input is sorted. The question arises is what is the smallest subproblem? The smallest subproblem is where input cannot be further divided, a subproblem with one item to sorted.

Once, the split is done to the smallest size, start merging. Going from the bottom, start with two one-item subproblems, combine that to create a two-item subproblem, then combine two two-items subproblems to create a four-item subproblem. Go on till you get a problem with the original size.

## Implementation of Merge sort algorithm

As is the case with all divide and conquer algorithm problems, same processing is applied to subproblem which was applied to original problem, a case for recursive solution. Recursion needs a termination condition. Base condition to stop recursion in merge sort algorithm is when subproblem has only one element to be sorted. How can you sort single element? You don’t do, just return the element.

Now, how do we combine two subproblems solutions? This step is to conquer part. Sort smaller parts and combine them with a merge operation. In merge operation, scan both sorted arrays one by one element, and based on the output of comparison, add one of the two items into output array, till both arrays are scanned. If one array is finished before the other, just copy all elements from another array to the output array. Copy this output array back to the original input array so that it can be combined with bigger sub-problems till a solution to the original problem is derived.

```int merge(int a[], int start, int mid, int end){

int i,j,k;
int temp[end-start+1];

i = start; j = mid+1; k =0;
/* Compare all elements of two sorted arrays and store
them in sorted array. */
while(i <= mid && j <= end){
if(a[i]< a[j]){
temp[k++]= a[i++];
}
else {
temp[k++]  = a[j++];
}
}
while(i <= mid){
temp[k++] = a[i++];
}
while(j <= end){
temp[k++] = a[j++];
}
//Copy the temporary array into original array
for(i=0; i<k; i++){
a[i+start] = temp[i];
}
}
int mergeSort(int a[], int start, int end ){

int mid  = (start + end)/2;
if(start<end){
//sort the left part
mergeSort(a, start, mid);
//sort right part
mergeSort(a, mid+1, end);

//merge them together
merge(a, start, mid, end);
}
}
int main(){
int a[] = {2,3,4,1,8,7,6,9};
int size = sizeof(a)/sizeof(a[0]);

mergeSort(a, 0, size-1);
for(int i=0; i<size; i++){
printf("%d ", a[i]);
}
}

```

Let’s analyze the complexity of merge sort algorithm, it can be divided into 2 parts :
1. Split step, which takes the constant amount of time, all it does is create two arrays half the size of the original array, irrespective of the size of the array. So, the complexity of this step is O(1).

2. Combine step complexity is O(n) where n is the number of elements.
The complexity of each divide and conquer step is O(n). So, how many such divides and conquer steps do we have?
For n elements in the array, there will be O(logn) such steps. Hence, the complexity of merge sort algorithm is O(nlogn)

However, there is one caveat, every time we merge, two sub-arrays an auxiliary array is needed to temporarily store array elements and that incurs O(n) space complexity on merge sort algorithm.

There are some improvements which can be done on this algorithm.
1. When the number of elements is less than some threshold, one can use insertion or selection sort to sort those numbers. Why? Because when n is small, O(n2) is as good as O(nlogn) and it saves the extra overhead of split and merging. All in all, using insertion sort in input array with small size, can give better performance.

2. Before calling merging, check if all the elements in right array are greater then left array, if yes, no need of merging. This can be easily checked by comparing a[mid] with a[mid+1]. If a[mid] is less than a[mid+1], two sub arrays are already sorted and we don’t need to perform merge.

Please let us what do you think of this article and if there is something wrong or missing.

## Group Anagrams

Many a times we need to sort some things which are not integers, foremost being strings. How do we go about using quicksort? I did not know that the C library has a function for it. This function is very powerful and easily sort anything given the compare function. So in today’s post, we will be using that function and learn how it can be used to solve our problem statement.

Given a list of words, group all words together which are anagrams. For example, if the input is like :

`cat tac act dog god gdo`

First of all, we want to understand how to strings can be identified as anagrams. There are more efficient ways to do it, but the most simple one is sort two string based on characters and if two resultant strings match, two strings are anagrams.
Now the problem boils down to sorting each individual string and then sort the entire array of strings. The anagram strings will automatically grouped together.
If we look closely, we have another problem, that is when we sort the original strings individually, original strings are lost. So, idea is to have a duplicate array of strings and then sorting operation. Also to after sorting the duplicate array, we will lose the original index of string once we sort the entire array. Hence we need to store that information too. Let’s figure what we need to store and how? We have array of strings, let’s say there can be a maximum 100 such strings. So we create an array of character pointers to store these strings. Also, we need a duplicate array where we will sort strings and also we need to store the position of the string in original array so, the duplicate array would look like:
Next step is to sort individual string in the duplicate array, where we will use the library function qsort().
This function is very easy to use, there are four parameters you have to pass:
1. The buffer or the array you want to have sorting performed on.
2. The size of the buffer
3. Size of an individual element of the buffer.
4. And last and most important is the compare function, which has to be written in code and passed as function pointer to this function. For further details of qsort() function usage follow : qsort
Once all the strings are individually sorted, sort the entire array of strings using the same qsort(), only difference will be the compare function which will now run on an entire string instead of character. We are almost done! Now we have all anagrams placed next to each other, however, we need to print original words. That’s where the index we stored in duplicate array will help and we will print words from original array using the order given in the duplicate array.

```class Solution {
public List<List<String>> groupAnagrams(String[] strs) {

Map<String, List<String>> map = new HashMap<>();

for(int i=0; i < strs.length; i++){
char [] temp = strs[i].toCharArray();
Arrays.sort(temp);

String keyString  = String.valueOf(temp);

if(!map.containsKey(keyString)){
map.put(keyString, new ArrayList<>());
}
}

return new ArrayList<List<String>>(map.values());
}
}
```

This code will perform sorting on M strings, each of size N characters, so the complexity of the algorithm to print all anagrams together will be O(MlogM + MNlogN) MNlogN because NlogN for sorting each string of size N and there are M strings, MlogM to sort M strings in the array.

## Merge two sorted linked lists

Problem statement is simple: Merge two sorted linked lists, without using extra space. To refer to the basics of linked list, please follow the post : Linked list data structure. This problem is commonly asked in a telephonic round of Amazon and Microsoft. Let’s take an example and understand what is required as a solution. Given two linked lists as following,

Result should be like this:

Consider the following two steps to merge sorted linked lists. First, figure out which node should be head of result list. Compare head nodes of two give lists, whichever is smaller, that should be the head of result list.

Second, compare two nodes, one from each list and decide which should go next in result linked list.  Advance the pointer to next node of the node which is added to result list.

As no new node is allocated during this merge, we have to make sure that all the references are maintained when nodes are added to merged linked list.

We can start with one list as merge list and add nodes from second list at appropriate place in that list. Let’s say L1 is our merged list and we always compare node on L2 to see if it should be placed in L1 at current position. L1 grows as more nodes are sorted in merge list.

We compare first two nodes L1 and L2, and decide that `node(2)` has to go in merged list as head. If it was head of L2, we would have swapped L1 and L2 heads and still L1 will be head of merged list. Why? Because we want that L1 always points to last node in merged list and L1 to represent sorted merged list till this point and L2 switches between two input lists.

As L1 always points to the last node of merged linked list, next node to compare should be L1.next i.e `node(4)` and L2 i.e `node(3)`.

As L1 follows the merged linked list, we will move L1.next to point node(3), however doing it directly will lead to lose of entire linked list following it. So we do it in four steps : store L1 next as temp; link L2 to L1.next; L2 points to temp and then move L1 to L1.next

```Node temp = L1.next;
L1.next = L2;
L2 = temp;
L1 = L1.next
```

Next nodes to be compared are node(5), which is L1.next and node(5) which is L2.

Of course `node(4)` has to be added to merged linked list, what should we do? First save L1.next in temp, temp now points to `node(5)`. Second, point L1.next to L2, point L2 to temp, and at last, L1 moves to L1.next. State of two sorted linked lists looks as follows.

By this time you must have noticed that L1 and L2 do not point to the one list all the time, L1 always points to the last node of merged list and L2 points to first node of separated list which needs to be merged.

Now, L1.next which is `node(7)` and L2 which is `node(5)` will be compared.

`node(5)` is to be added in merged sorted list. Again same set of steps. L1.next stored as temp, L1.next points to L2 i.e. `node(5)` and then L2 points to temp i.e. `node(7)`

Again, `node(9)` which is L1.next will be compared to L2 i.e `node(7)`. L1.next should point to L2. Final state will be as follows

At this point, L1.next i.e node(8) is less than L2, this is simple case, where we just move L1 to L1.next and L2 remains as is.

At this point, special condition occurs which is L1.next is null. In this case, point L1.next to L2 and two linked lists are merged.

## Merge 2 sorted linked lists : Implementation

```#include <stdio.h>
#include <stdlib.h>

typedef struct node{
int data;
struct node *next;
} Node;

Node * createNode(int val){
Node * temp = (Node *)malloc(sizeof(Node));
if(temp){
temp->data = val;
temp->next = NULL;
}
return temp;
}
/* This function inserts node at the head of linked list */
Node * newNode  = createNode(data);
}

}

printf("NULL");
printf("\n");
}
Node * MergeLists(Node *list1, Node *list2) {
if (!list1) return list2;
if (!list2) return list1;

if (list1->data < list2->data) {
} else {
list2 = list1;
}

while(list1->next && list2) {
if (list1->next->data > list2->data) {
//Step 1. Save the next pointer
Node *tmp = list1->next;
//Step 2. Change next pointer to point L2
list1->next = list2;
//Step 3. Move L2 to temp
list2 = tmp;
}
list1 = list1->next;
}
if (!list1->next) list1->next = list2;
}
int main(){
Node * L1 = NULL;
Node * L2 = NULL;
Node * result = NULL;
int carry = 0 ;
/* creating list 1 */
push(&L1,7);
push(&L1,6);
push(&L1,4);
push(&L1,3);

/* creating list 2 */
push(&L2,10);
push(&L2,8);
push(&L2,1);

L1 = MergeLists(L1,L2);
printList(L1);

return 0;
}

```

Complexity of this method to merge two sorted lists into one is O(n+m) where n and m are number of nodes in two sorted linked lists.

```#include<stdlib.h>
#include<stdio.h>

typedef struct node{
int data;
struct node *next;
} Node;

Node * mergeSort(Node *a, Node *b){
Node *result = NULL;
if(a ==  NULL)
return b;
else if(b == NULL)
return a;

/* For the first node, we would set the result to either a or b */
if(a->data <= b->data){
result = a;
/* Result's next will point to smaller one in lists
starting at a->next  and b */
result->next = mergeSort(a->next,b);
}
else {
result = b;
/*Result's next will point to smaller one in lists
starting at a and b->next */
result->next = mergeSort(a,b->next);
}
return result;
}

Node * createNode(int val){
Node * temp = (Node *)malloc(sizeof(Node));
if(temp){
temp->data = val;
temp->next = NULL;
}
return temp;
}
/* This function inserts node at the head of linked list */
Node * newNode  = createNode(data);
}

}

printf("NULL");
printf("\n");
}

/* Driver program to run above code */
int main(){
Node * L1 = NULL;
Node * L2 = NULL;
Node * result = NULL;
int carry = 0 ;
/* creating list 1 */
push(&L1,7);
push(&L1,6);
push(&L1,4);
push(&L1,3);
/* creating list 2 */
push(&L2,10);
push(&L2,8);
push(&L2,1);

L1 = mergeSort(L1,L2);
printList(L1);

return 0;
}
```

Please share if there is something wrong or missing. If you want to take personalized coaching from our expert teachers, please signup for free demo class.

# Quick sort Algorithm

Quicksort like merge sort is a sorting algorithm under divide and conquer paradigm of algorithms like merge sort. The basic idea of the algorithm is to divide inputs around a pivot and then sort two smaller parts recursively and finally get original input sorted.

### Selection of pivot

The entire idea of quicksort revolves around a pivot. Pivot is an element in input around which input is arranged in such a way that all elements on the left side are smaller and all elements on the right side are greater than the pivot. The question is how to find or select pivot and put it into the correct position.

To make things simpler to start with, let’s assume that the first element of the input is pivot element.

To put this pivot at the correct position in input, start with the next element of pivot in input space and find the first element which is greater than pivot. Let that be ith position.

At the same time, start from end of the array and find first element which is smaller than pivot. Let it be jth position.

If i and j have not crossed each other i.e i < j, then swap element at ith and jth positions, and continue moving right on input to find element greater than pivot and moving left to find element smaller than pivot.
Once i and j cross each other, swap pivot with element at jth position.  After this step, pivot will be at its correct position and array will be divided into two parts. All elements on left side will be less than pivot and all elements on right side will be greater than pivot.

### Quick sort partition example

This is too much to process, I know! Let’s take an example and see how it does it work? We have an array as follows

Let’s select first element as pivot, pivot = 3.

Start from the next element of the pivot, move towards the right of the array, till we see the first element which is greater than pivot i.e. 3.

From end of the array, move towards left till you find an element that is less than the pivot.

Now, there are two indices, i and j, where A[i] > pivot and A[j] < pivot. See that i and j not yet crossed each other. Hence, we swap A[i] with A[j]. Array at the bottom of pic, shows resultant array after swap.

Again, start with i+1 and follow the same rule: Stop when you find an element greater than the pivot. In this case, 10 is greater than 3, hence we stop.

Similarly, move left from the end again, until we find an element which is less than the pivot. In this case, we end up at index = 2 which is element 1.

Since `i > j`, then means paths have been crossed. At this time, instead of swapping element at i and j index, swap element at j index with pivot.

After swapping pivot with jth index, we have array divided into two parts, pivot as a boundary. All elements on the left side of the pivot are smaller (they may not be sorted) and all elements on the right side of the pivot are greater than pivot (again may not be sorted).

We, apply this same partition process to left and right arrays again, till the base condition is hit. In this case, the base condition would be if there is only one element in the array to be partitioned.

### Quick sort algorithm

quickSort([], start, end)
1. If array has more than one elements i.e (start < end):
1.1 Find correct place for pivot.
pivot = partition(arr, low, high)
1.2. Apply same function recursively to left of pivot index
quickSort(arr, start, pivot -1 )
and to the right of pivot index
quickSort(arr, pivot + 1, end)

### Quick sort implementation

```package AlgorithmsAndMe;

public class QuickSort {

private void swap(int [] a, int i, int j){
int temp = a[i];
a[i] = a[j];
a[j] = temp;
}

private int partition(int [] a, int start, int end){
int pivot = a[start];
int i  = start;
int j  = end;

while(i < j){
while(i <= end && a[i] <= pivot) i++;
while(j >= start && a[j] > pivot) j--;

if(i < j) {
swap(a, i, j);
}
}

swap(a, start, j);
return j;
}

public void quickSort(int [] a, int start, int end){
if(start < end){
int p = partition(a, start, end);
quickSort(a,start, p-1);
quickSort(a, p+1, end);
}
}
}

```

There is another implementation that is based on the Lomuto partition scheme, in this scheme, we make the last element as pivot. The implementation is compact but complexity is a bit higher than the original partition methods in terms of the number of swaps.

```#include<stdlib.h>
#include<stdio.h>

void swap(int *a, int *b){
int temp = *a;
*a = *b;
*b = temp;
}

int partition(int a[], int low, int high)
{
// set pivot as highest element
int x  = a[high];

//Current low points to previous of low of this part of array.
int i = low - 1;

for (int j = low; j <= high-1; j++)
{
/*Move in the array till current node data is
less than the pivot */
if (a[j] <= x){
//set the current low appropriately
i++;
swap(&a[i], &a[j]);
}
}
//Now swap the next node of current low with pivot

swap(&a[i+1], &a[high]);

printf("\n Pivot : %d\n", a[i+1]);
for(int j=0; j<=high; j++){

printf("%d ", a[j]);
}
//return current low as partitioning point.
return i+1;
}

/* A recursive implementation of quicksort for linked list */
void quickSortUtil(int a[], int low, int high)
{
if (low < high)
{
int p = partition(a,low, high);
quickSortUtil(a,low, p-1);
quickSortUtil(a, p+1, high);
}
}

/* Driver program to run above code */
int main(){

int a[] = {5,4,2,7,9,1,6,10,8};

int size = sizeof(a)/sizeof(a[0]);
quickSortUtil(a, 0, size-1);

for(int i=0; i<size; i++){
printf("%d ", a[i]);
}
return 0;
}
```

### Complexity analysis of quick sort algorithm

If pivot splits the original array into two equal parts (which is the intention), the complexity of quicksort is O(nlogn). However, worst-case complexity of quick sort happens when the input array is already sorted in increasing or decreasing order. In this case, array is partitioned into two subarrays, one with size 1 and other with size n-1. Similarly, subarray with n-1 elements, it again is divided into two subarrays of size 1 and n-2. In order to completely sort the array, it will split for n-1 times, and each time it requires to traverse n element to find the correct position of pivot. Hence overall complexity of quick sort comes out as O(n2).

There is a very interesting question, which tests your understanding of system basics. Question is what is the space complexity of this algorithm? There is no apparent memory is used. However, recursive implementation internally puts stack frames on the stack for partitioned indices and function call return addresses and so on. In the worst case, there can be n stack frames, hence worst-case complexity of quicksort will be O(n).

How can we reduce that? If the partition with fewest elements is (recursively) sorted first, it requires at most O(log n) space. Then the other partition is sorted using tail recursion or iteration, which doesn’t add to the call stack. This idea, was described by R. Sedgewick, and keeps the stack depth bounded by O(log n) and hence space complexity will be O(log n).

## Quick sort with tail recursion

```Quicksort(A, p, r)
{
while (p < r)
{
q = Partition(A, p, r)
Quicksort(A, p, q)
p = q+1
}
}
```

Selection of Pivot
If an array is completely sorted, then the worst-case behavior of quicksort is O(n2), so there comes another problem. How can we select pivot so that two subarrays are almost equal size? There are many solutions proposed.
1. Taking median of the array as a pivot. So how to select the median of an unsorted array. We look into this problem separately, but yes it guarantees two halves of the same size.
2. Selecting pivot randomly. This requires heuristics algorithms to select pivot.

Please leave your comment in case you find something wrong or you have some improved version.

# Count sort : Sorting in linear time

Are there any sorting algorithm which has worst case complexity of O(n)? There are a few like count sort and decision tree algorithms. In this post we would discuss about count sort and couple of problems where this counting sort algorithm can be applied.

Counting sort was invented by Harold H. Seward
To apply counting sort, we need to keep in mind following assumptions:

1. There should be duplicate values in the input
2. There should be at most K different type of values in input.
3. The input data ranges in 0 to K

Count sort goes for O(n2) if K is very close to n i.e. a very few elements are duplicate and rest all are unique. So above three conditions are necessary to have this algorithm work in linear time.

Count Sort -Approach
Let’s see how does it work? There are three steps involved in the algorithm:

• Sampling the data, counting the frequencies of each element in the input set.
• Accumulating frequencies to find out relative positions of each element.
• Third step distributes each element to its appropriate place.
• Now let’s take one example and go through the above three steps:
Input is  an array A = [1,2,3,2,2,3,3,1,1,1,1,3,2,2]. Here we can see that K=3. Now let’s perform the first step of the method, i.e. count the frequency of each element. We keep a hash and keep track of each element’s count. Since values are upper bound by K, we need at most K sized hash. We initialize this hash as zero. A is input array and n is the size of the array.

```int char count [K];

for(i=0; i<K;i++){
count[i]=0;
}

for(i=0;i<n;i++){
count[a[i]]++;
}
```

Count looks likes count =[5,5,4]. The complexity of this step is O(n). The second step involves the accumulation of frequencies where we add frequencies of previous elements to the current element.
F(j) = summation of f[i] for all i=0

```F(j) = summation of f[i] for all i<=j and i>=0

for(i=1;i<K;i++){
Count[i] +=count[i-1];
}
```

The second step runs for K times hence the complexity of O(K). The third step involves distribution. Let’s create an output array called as B of size n For every element in A we check the corresponding count in count array and place the element at that position. So, first 1 will be at position 4 (there are total 5 and array is an index from 0), the first two will be placed at 9 and the first three will be at 13. Subsequently, lower positions will be filled as we encounter them in the input array.

```for(i=0; i<n;i++){
B[--count[a[i]]] = a[i];
}
```

This step has complexity of O(n)

```#include<stdlib.h>
#include<stdio.h>

void count_sort(int a[], int n, int K){

int count [K];
int i,j;
int b[n];
for(i=0; i<=K;i++){
count[i]=0;
}

for(i=0;i<n;i++){
count[a[i]]++;
}

for(i=1;i<=K;i++){
count[i] +=count[i-1];
}
for(j=0;j<n;j++){
b[j]=0;
}

for(i=0; i<n;i++){
count[a[i]]--;
b[count[a[i]]] = a[i];
}
}
int main(){
int a[]= {0,1,1,1,0,0,0,3,3,3,4,4};
int n  =  sizeof(a)/sizeof(a[0]);

count_sort(a,n,4);
}
```
```Iter 0 :0  0  0  0  1  0  0  0  0  0  0  0  0  0
Iter 1 :0  0  0  0  1  0  0  0  0  2  0  0  0  0
Iter 2 :0  0  0  0  1  0  0  0  0  2  0  0  0  3
Iter 3 :0  0  0  0  1  0  0  0  2  2  0  0  0  3
Iter 4 :0  0  0  0  1  0  0  2  2  2  0  0  0  3
Iter 5 :0  0  0  0  1  0  0  2  2  2  0  0  3  3
Iter 6 :0  0  0  0  1  0  0  2  2  2  0  3  3  3
Iter 7 :0  0  0  1  1  0  0  2  2  2  0  3  3  3
Iter 8 :0  0  1  1  1  0  0  2  2  2  0  3  3  3
Iter 9 :0  1  1  1  1  0  0  2  2  2  0  3  3  3
Iter 10 :1  1  1  1  1  0  0  2  2  2  0  3  3  3
Iter 11 :1  1  1  1  1  0  0  2  2  2  3  3  3  3
Iter 12 :1  1  1  1  1  0  2  2  2  2  3  3  3  3
Iter 13 :1  1  1  1  1  2  2  2  2  2  3  3  3  3

Final O/P :1  1  1  1  1  2  2  2  2  2  3  3  3  3
```

Total complexity of the algorithm being O(K)+ O(n) + O(K) +O(n) = O(K+n). Please refer to the master theorem, how this is derived. Now if we see, K+n remains in order of n till the time K<n, if K goes on to n^2, the complexity of algorithm goes for polynomial time instead of linear time.

There is immediate application of this algorithm in following problem:
Let’s there is an array which contains Black, White and Red balls, we need to arrange these balls in such a way that all black balls come first, white second and Red at last. Hint: assign black 0, white 1 and Red 2 and see. 🙂

In the next post, we would discuss how extra space can be saved, how initial positions of elements can be maintained. We would go through an interesting problem to discuss the above optimization.

Why this filling from the end of the span? Because this step makes count sort stable sort. Here we have seen a sorting algorithm which can give us o/p linear time given some particular conditions met on the i/p data.