## Application of binary search

There are problems which can be easily solved using linear time, can be optimized to log N time complexity using binary search concept. Today we will a problem which applies binary search concept to solve one such problem.

## Problem statement

Given an array containing only 0s and 1s in sorted order. Find the first occurrence of 1 in array.
Same problem can be asked as find last occurrence of 0.
Other variant is to find number of ones of zeroes in given array.

## Analysis

Linear time solution is easy. Just go through array and once you hit 1, return the index. Great!
But we need to take advantage of the fact that given array is sorted. So if a given element is not 1, it is guaranteed that no element before it will be 1. So to search first instance of 1, we need to look in right sub array.
What if selected element is 1? That does not guarantee that it will be first instance of 1. But it reduces size of array to look into as first instance will be either at selected index or any index less than it.
So in both the cases we are discarding a part of given input array. Best way to split array is to split it in two parts and that leads us to binary search.
We will slightly tweak the binary search algorithm and get our answer.

Second variant can be solved in similar manner, only we need to discard different portion of array than what we did in main problem.

Third variant is extension of the first problem itself. Once we get the first instance of 1, we can subtract that from size of array and get number of occurrence of 1 in array.

## Code

First instance of a number

Last instance of a number

No of instances of a number

## Complexity analysis

Complexity of all above codes is O(log N).

# Minimum number of jumps to reach end

Given an array of integers, one can maximum jump a[i] from a given index i. Find minimum number of jumps required to reach at the end of the array.

For example,In following array, minimum number of jumps is 2.

At 2 we can either jump 0, 1 or 2 indices at a time. If we jump 2 indices, we would require two more jumps (at 1 and 1) to reach at 4.
However if we jump only one index, next time we will reach at the end of the array

 Solution with 3 jumps
 Solution with 2 jumps

## Analysis

While finding minimum jumps to read end we need to minimize something, that is jump. How can we do that? We take longest jump possible from each index. While taking jump we also take care that which among the possible landing indexes can give us maximum jump ahead. So every time we land, we have selected that index because it will give us maximum jump. Just greedy algorithm.
Mathematically,
for index i, scan through all indices e from i+1 till e = i + a[i] and calculate a value v = e +a[e]. Take e with maximum v.

let’s work out an example:
In above example, for i =0, we check e = 1 and v = 1 (e)+3 (a[e]) =4
e = 2 and v = 2 + 1 = 3
So we select e =1.

What will be the complexity of the code? It would be O(N).  Can we solve this using dynamic programming?

What we need to find minimum jumps to reach Nth index. We can reduce this problem like,
If we find minimum number of jumps for 0 and N-1 and can reach from any of them to Nth item, jumps for Nth index would be one more than that.
If i +a[i] >= N and if Jumps[i] +1 < Jump[N], Jump[N] becomes Jumps[i] +1
For starting index, jumps required would be zero.

## Complexity Analysis

Complexity of the algorithm would be O(min(k,N) *N) along space complexity of O(N), where K is maximum jump.

## Dynamic programming : Longest Common Subsequence

A subsequence of string is set of all the characters which in a left to right order and not necessarily contiguous.  For example: string ABCDEG has ABC, ADG, EG, BCDEG subsequences.

whereas BDA is not a subsequence of given string.

## Problem statement

Given two strings X and Y, find longest common subsequence Z.

For example
X = ABCDSEFGD
Y = ACFEFXVGAB
LCS Z would be ACEFG.

## Analysis

Can we divide this problem in sub problems?

Let’s say length of X is N and length of Y as M.
We start from the end of both strings. Check if X[N] == Y[M].
If yes, over problem now reduces to find the longest common subsequence in X[1…N-1] and Y[1…M-1].

What if they are not equal? Then one by one we have to exclude character from string X and Y.

So first we exclude the character from the X. hence problem reduces to X[1…N-1] and Y[1…M].
If we exclude the character from Y, problem reduces to X[1…N] and Y[1…M-1].

We would take maximum from both the cases. So the recursive relation comes up as

C[i,j]  =  1 + C[i-1, j-1] if X[i] == Y[j]

=   MAX (C[i-1,j], C[i, j-1]) if X[i] != Y[j]
=   0 if i or j is 0

Above relation can be easily implemented using recursion construct of programming language.

When we see that implementation, we notice that there are subproblems which are solved multiple times. To avoid solving those subproblems again and again, we can store values of those subproblems.

This gives us a perfect case for application of dynamic programming.

## Complexity analysis

Above implementation has time and space complexity of O(N*N).

# 0-1 Knapsack problem

0/1 Knapsack is typical problem which is used to demonstrate application of greedy algorithm as well as dynamic programming. There are cases when applying greedy algorithm does not give optimal solution.
There are many flavors in which Knapsack problem can be asked.
1. A thief enters a museum and want to steal artifacts from there. Every artifact has a weight and value associated with it. Thief carries a knapsack (bag) which can take only a specific weight.

Problem is to find the combination of artifacts thief takes so that he gets maximum value and weight of all the taken artifacts is less the capacity of the bag he has brought.
Thief cannot take any artifact partially. Either he takes it or leaves it. Hence the problem is 0-1 knapsack.

2. Second flavor is : We have N files each having a size say Si. We have a total storage capacity of W bytes. For each file to be store the re-computation cost is Vi. Problem is to store as files on storage that combined size of all files is less than W and their re-computation value is maximum.

We can either store or leave the file. We cannot store partial file. Hence this is a case of 0-1 knapsack problem.
Knapsack problem in pure mathematics terms:

### Analysis

Brute force method would try all the subsets of the set of items and see which one gives as the maximum value. This method would be of exponential order.
Can we do better?  If we consider every item, there are two possibilities associated with it.
First, given item is included in the subset which is optimal. Then we need to find out all the items in remaining N-1 items which can optimize the sub problem for weight W-wk. Value of this item is added to candidate maximum value.
Second case is the item is not included into the set. In that case, we need to find out items in remaining N-1 items which can optimize the the original problem. Value of this item is not added into candidate maximum value. Inclusion depends on two conditions : 1. Weight of the item is less than the total weight. 2. Inclusion of item increases the value which was already there with K-1 items with W-Wk weight.
When remaining allowed weight is zero (case 1 ) or we have considered all items (case 2) , we have reached the solution.

## Dynamic programming Implementation of knapsack problem

In implementation, we define a two dimensional array of size N * W.
Element in this array can be interpreted as for a given value of W  w (w<W) and for a given number of items i (i < N),   best solution would be value of that element i.e array(I, w).
For i =0 and w=0, all the values will be zero. Hence first column and first row will be filled with all zero values. We would build on top of that.
For each item in set, we would check for maximum value we can get for weight w.

As explained in the analysis, based on its two conditions, we include or exclude the item.
We can easily keep track of which items are included in optimal structure by keeping boolean two dimensional array. Each element a[i,j] is true if for weight j ith item was included.

### Code

One similar problem which can be solved with same approach is minimum number of coins to be used to get change of a particular amount.
I am skipping the whole analysis and directly pasting the code here.

### Complexity analysis

Complexity of the dynamic programming implementation of knapsack problem is O(N *W). Space complexity is again O(N*W). It is thumb rule that we trade space for time in dynamic programming.

## Problem statement

Given K sorted arrays each having N elements, we need to merge them all in a N*K size array in sorted order.

## Analysis

Since all the arrays are sorted, candidate for the first result array position will be among the first element in all arrays. How can we find the minimum among all the elements plucked from the first index of each array ? Easy, take those K elements (there are K array, hence K first elements) and build a min heap. The root of the mi heap the list element in all arrays and hence the first element in the resultant array.

Great, now what?
Now we need to get the next element from the array from which the previous element was selected.
Guess what, we need to maintain the information with the data, which array it came and what was the position of information in that particular array. So we will have a struct as heap element and not as only integers as we seen in previous examples/posts.

Now, based on the information stored in the heap structure, get the next element from the appropriate array (we have stored array no with the info in heap node).

If we have not already considered all elements of that array, put the next element at root of min heap and again heapify. This step will give us second least element in all K elements.

Follow the procedure for N*K times. Once an array elements are considered, we need to replace min heap root with INT_MAX so that it is not considered again 🙂

## Algorithm

1. Build min heap with first element of all K arrays.
2. Pick the root of min element and put it in the resultant array.
3. If there are remaining elements in the array,
Put that element at root of min heap and heapify again
4. If all elements are already considered, put INT_MAX and heapify again.
5. Repeat step 2,3,4 for N*K times.

## Complexity analysis

Complexity of code will be O(N * K * logN).

## Problem statement

Given an unsorted array of N integers, find K smallest element from it.

## Analysis

We can do this very easily in O(nlogn) complexity by just sorting the array and returning first K elements. We can even get them without actually sorting all elements of the array.
Refer to following post for detailed explanation : Find Kth smallest element in unsorted array

Here we are looking at an algorithm which is as close to O(N).Heap (min heap) data structure helps us here.
There is two step algorithm.

1. First create a min heap from the given array in place.
2. Now just perform K deletes from the created min heap and we have K smallest elements.

## Complexity analysis

Creation of heap takes O(N) time and deletion of K elements from min heap takes O(KlogN).
So overall complexity of algorithm is O(N+ KlogN).

# Reverse words in a string

A string can contain many words which are separated by space. We need to reverse words of string.
For example if string is “this is test string” , output should be “string test is this”

## Analysis

What do i get if reverse the whole string first?
In above example we get “gnirts tset si siht”. If see closely we get that the first word in the reverse string is exact reverse of the first word of the output string. Similarly second word is exact reverse of the second word of the output string.
So we can just reverse individual words in the string and get the output like string.
Hence the algorithm involves two simple steps:

1. Reverse the whole string first.
2. Reverse individual words of the output string given in step 1.

Same algorithm can be used when we want to rotate an array with a given index as pivot.
Below is code for that too.

## Code to reverse words of a string

### Test cases

1. A Normal string with multiple words
s = “This is test string”

2. A string with only one word
s =”string”

3. An empty string
s =””

4. A NULL pointer is passed
s =  NULL

# Problem 2 : Rotate an array with a given index as pivot

Complexity of above code is O(N) without using any extra space.

## Strings : Remove duplicates from a string

A string is a collection of characters terminated by a special character ”. String can have duplicate characters in it. Today’s problem requires purging of those duplicate characters from string and return the string.

## Problem statement

Given a string S, remove all duplicate characters from it.
For example,
S = “aaabbacbccd”, then output should be “abcd”. Note the output is not  “d”, that means we need to maintain once instance of the duplicate character.

## Analysis

There are two parts involved in this problem.
First, we need to keep track whether we have already processed a particular character.
Second we need to properly place the character in destination string.

For the first part hashing is the perfect technique. We can create a hash with key as character (total 256 characters) and set the value when we first encounter the character. Next time when we encounter the character, we would find the value set against that character and hence we can skip that character.

For second part we can have an auxiliary string to store the non-duplicate characters.
We can do it in-place too.
Keep two indexes, one which keeps track the character being processed in input string, other which points to the next place in the input string where the current character can be put if it is not processed already.

## Test cases

1. When input string contains duplicate characters
S  = “aaabaaabbbcccd”

2. When input string does not contain duplicate characters
S = “abcdefg”

3. Input string with no characters
S = “”

4 Input string pointer is NULL
S = NULL;

Execution of these test cases can be seen here.

## Complexity Analysis

Complexity of code is O(N) in time and constant memory for hash as it does not depend on the size of the input string.

## Strings : Toeknize a string separated by delimiter

Continuing the previous post, let’s look at the second problem.

## Problem statement

Find all tokens in a string which are separated by given delimiter.

## Analysis

We have to traverse the string and for each character, check if that character is present in delimiter string. If it is, then it is end of the token started after previous encounter of the delimiter. Here forward we should scan all characters till the time we first encounter a character which is not a part of given delimiter string, that would be start of the next token.

Let’s look at it step by step.

Step 1 : Assume we have string as s and delimiter string as delim.
Search in s the first character which is not present in delim.
If there is no such byte, we are done, return NULL.
If there is such byte, then that would be start of our first token.

Step 2: Scan through the found token.
From the character found in step 1, scan through s till we again encounter a character present in delim.
If there is no such character, we are done, we have only one token and return the start position of that token.

If we have such character, that would be end of the token.

Great we have found one token, How about next one?

For next token to be generated, we need to keep track where to start looking from in the string as we would have scanned a part of it in the previous tokens. So we keep track of the pointer where to start looking from.

Step 3 : Once we have found the end of token in step 2, again scan out all subsequent characters which are part of delim. Take the index of first non-delimiter character. This would be the starting point of our next token search.

Implementation note : To distinguish between whether it is first call to the function or subsequent call, we pass NULL pointer to the source string parameter indicating that it is subsequent call and not initial call.

Similar approach can be used to remove all characters which are present in a string from another string.

## Test cases

1. String and delimiter with one characters
S = “aaab_dddcc”
delim = “_”

2. String and delimiter with multiple characters
S = “aaab_dd?dcc”
delim = “_?”

3. Empty string
S = “”
delim = “_”

4. Empty delimiter
S = “aaab_dddcc”
delim = “”

5. String starting with delimiters
S = ___aaaaabbb
s = “_”

6. String ending with delimiters
S =”aaaa___”
s = “_”

7. String with no delimiter character
S = “aaaaaaaa”
s = “_”

8. String with all character as delimiter characters
S =”________”
s =”_”

## Complexity analysis

Complexity of above code is O(N * M), where N is length of string and M is length of delimiter string.

# Detecting loop in singly linked list

Given a singly linked list, which may or may not contain a loop in it, we need to find out whether there is a loop and it yes, start of the loop should be returned.

For example, in following linked list, function should return 4.

### Analysis

In singly linked list, we can traverse only in one direction and end of the list is detected when NULL pointer is encountered. What if linked list has a loop in it? Then we would never reach end of the list and circle around in the loop.
So let’s use this property. We move two pointers at different speed to traverse list, we would surely meet at a node which in the loop.
Take two pointers, first which moves one node at a time, call it slow; other which moves two nodes at a time, call it fast.
When slow meets fast, that node will be in the loop.
Traversal of two pointers in above list would be
Slow : 2 , Fast : 3
Slow : 3 , Fast : 5
Slow : 4 , Fast : 3
Slow : 5 , Fast : 5

First problem is solved. If before reaching NULL, if slow meets fast, then there is a loop in list.

Second problem requires some thinking.
What would be the condition when slow meets fast? Can we have the length of the loop? Yes.
Move the slow pointer till it again meets faster pointer and keep count of number of nodes. Let number of nodes in loop be K

Now the problem reduces to finding Kth node from the end because if there are K nodes in loop, start node would be Kth node from the end of the list. Move one pointer K nodes ahead of head and keep other at head. Move both of them simultaneously. When they meet, that node is starting point of the loop.

## Algorithm to detect cycle in linked list

Cycle detection
1. Take two pointers, slow and fast.
2. Move slow one step at a time, fast two steps at a time.
3. If the meet before slow reach NULL, then there is cycle in list.
4. Else return false.
Starting node detection
1. Find the length of the loop in the list, by moving slow again to meet fast and keep track of count.
2. Once we find no of nodes in loop, K,  find Kth node from the end. (Can be read here)
Termination condition will not be NULL but two pointers one starting from head and other K nodes ahead of head being equal.

### Test cases

1. Normal list with loop
L1  = 1->2->3->4->5->6->4 (6 pointing back to 4)

2. Normal list with no loop
L1  = 1->2->3->4->5->6->NULL

3. Empty List
L1 = NULL

4 List with complete loop
L1  = 1->2->3->4->5->6->1 (6 pointing back to 1)

5. List with duplicate elements
L1  = 1->1->2->3->4->4->5->6->4 (6 pointing back to 4)

Complexity of above algorithm to find cycle in a linked list will be O(N).