Consistent hashing

Consistent Hashing

To understand consistent hashing, first of all we have to understand traditional hashing and it’s limitations in large scale distributed system.

Hashing in plain terms is nothing but a key value pair store, where given a key, associated value can be found very efficiently. Example : Let’s say we want to find a name of a street in city given it’s zip code. Idea is to store this information as hash as <Zip code, Street Name>.

Problem becomes more interesting when data is too big to store on one node or machine, multiple such nodes or machines are required in system to store it. For example, a system which uses number of web caches. First question: How to decide which key goes on which node? Simplest solution is use modulo function to decide. Given a key, take hash of key,  divide it by number of nodes in system and then put that key on to that node. Similarly, when fetching key, hash the key, divide with number of nodes and then go to that node and fetch value. Picture depicts conventional hashing in multi-node system.

traditional-hashing

Failures are common in commodity hardware driven distributed multi-node system. Any node can die without any prior notice and expectation is that system runs unaffected with slight cost on performance. What happens when in system described above, a node goes down?  In traditional hashing, total number of nodes have decreased, so function determining node to put key on or fetch key from, changes. New keys will be working fine. What happens to existing keys? They are all in wrong nodes as per new function.

traditional-hashing (1)

To fix this problem, we have to redistribute all existing keys on remaining nodes, which may be very costly operation and can have detrimental effects on running system. Again what happens when node comes back? well, repeat what was done when node went down. This may result in thrashing effect where if a node misbehaves regularly, system would do no good work except from re-distribution of keys.

How to solve this challenge? This is where consistent hashing comes into picture.

Consistent hashing in distributed multi-node systems

Consistent hashing comes up with a very simple idea, that is to have nodes and keys in same id space unlike traditional hashing where node id and keys were in two different id space. Node id can be hash function to IP address and then same hash function is applied to keys to determine which node key goes on or to fetch from.

Critical requirement for consistent hashing implementation is to have a hash function which is consistent irrespective of system view and map keys roughly uniformly on all machines.

Chose any base hash function such that it maps a key space to integers in range [0..M]. Once we divide it with M, it gives us an unit circle. Now, each key once hashed represents a point on this unit circle.

Consistent hashing

How does key maps to a node exactly? Well, key is hashed and then put key on to first node you find while moving clockwise. Simple enough, huh? To find a key, take hash and go to first node while moving clockwise on to unit circle.

consistent-hashig (1)

How does it solve the problem of scale? Let’s my system is receiving  5 x load, what happens to nodes and how can I balance load or reduce it? Simple thing is to add more nodes uniformly distributed on unit circle and problem solved. Consistent hashing built of scale.

What happens when a node goes down? All the keys which were on this node are reallocated to next successor node on circle. All other keys remain unchanged. This is far more optimal compared to case when we have re-distribute all keys on failure of one node.

As mentioned earlier, we assume that hash function used will distribute keys on nodes uniformly, which is not realistic. To reduce non-uniformity,  virtual nodes are introduced. In this case, each nodes is hashed with K different hash function which maps nodes on different points on circle. Still node is going to get 1/N keys however, virtual nodes reduces key load variance significantly.

One challenge still remains : How to efficiently find successor node for a given key, we want to find source s such that h(s) > h(k) of key k. Intuitive solution is to use hash, but hashes do not maintain any ordering information. Best bet is to use binary search tree which maintain ordering, but the successor function is proportional to depth of tree which is O(N), We can reduce that by using balance binary search trees like red black trees which reduces complexity to log(N).

Where all consistent hashing is used?

Consistent hashing is used in Memcached, Casandra Amazon Dynamo etc.

If you find this article useful, please share. If there is something missing or wrong in article please share and we will correct it.

Reference:

http://theory.stanford.edu/~tim/s16/l/l1.pdf
http://www8.org/w8-papers/2a-webserver/caching/paper2.html

Most frequent words in file

Most frequent words in file

In last post:  find K smallest element in an array, we learned some concepts to find top or bottom element in a given set. Let’s apply those concept again to a different problem called most frequent words in a file.

Given a text file which contains strings/words, find n most frequently occurring words in it i.e. n words whose count is the maximum.

For example, if given file is like follows: one, two, two, three, three, three, four,four, four,four, five, five, five, five, five, six,six, six, six, six, six  and we are looking for three most frequent words, output should be :  six,five, four.

Line of thoughts

Brute force solution would be really easy, scan the whole file, get the count for each unique words in file. Then sort the output based on that count in descending order and then return first n words.

This problem has three parts to it. First, read all words from file, second created a map which store frequency count of each word on file. Third is to get top n words from that map.

Reading a file is pretty standard operation in any language.  Brush upon Java basics here. We will not focus on that and also that’s not the intention of this question.
Assuming we can read from the file, how can we store frequency count against words. Simple, use a hash map. We read word from file, store in hash if it is not already there with count 1. If words is already in hash map, update the count by 1.

After this step, we have a hash with count of occurrence of each word. Now comes the challenging part:  how to find top n or most frequent words from this hash map. Mind you that hash map does not store elements in sorted or any order. 

Rule of thumb is to find top or least from collection of items, heaps are best bet. To find top N most frequently occurring words, let’s try heap. 
Which heap would be best?  We need to get a limited set, if we have free entry in that set till n words(as we need n top words). All further words have to pass a test before they enter the set. 

If new word is less than the least frequent word in the set, what can be said about this new word? Well, it cannot be in top n words right? 
If new word has frequency more than word with least frequency in set, new word should enter the set and  word with least frequency should be moved out.
What we just explained is typical use of min heap, as it give least element at the root. To find top n most frequent words in file, below is the algorithm.

Most frequent words in file : algorithm

  1. Take first N words appearing in map along with their count and create a min heap with these N words.
  2. One by one read words from hash and check if frequency of new word is more than least frequent word on heap, i.e word at root of heap.
  3. If yes, remove root and add new word in min heap. If not, continue with next word.
  4. When done with all words in hash, min heap will contain N most frequently occurring words in file.

Implementation

package com.company;

import javafx.util.Pair;

import java.io.IOException;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.*;

/**
 * Created by sangar on 1.10.18.
 */
public class FrequentWords {

	public static HashMap<String, Integer> readFile(String fileName)
		throws IOException {
	HashMap<String, Integer> wordMap = new HashMap<>();

	Path path = Paths.get(fileName);
	try (Scanner scanner =  new Scanner(path)){
		while (scanner.hasNextLine()){
			String line = scanner.nextLine();
			if(wordMap.containsKey(line)) {
				wordMap.put(line, wordMap.get(line) + 1);
			}
			else{
				wordMap.put(line, 1);
			}
		}
	}
	return wordMap;
	}

	public static ArrayList<String> mostFrequentWords(String fileName, int n){
		ArrayList<String> topWords = new ArrayList<>();

		try {
			HashMap<String, Integer> wordMap = readFile(fileName);
			PriorityQueue<Pair<String, Integer>> pq =
				new PriorityQueue<>(n, (x,y) -> x.getValue().compareTo(y.getValue()));

			int i = 0;
			Iterator it = wordMap.entrySet().iterator();
			/*
				Get first n words on heap
			*/
			while(it.hasNext()){
				if(i == n) break;
				HashMap.Entry<String, Integer> entry =
					(HashMap.Entry<String, Integer>)it.next();
				pq.add(new Pair<>(entry.getKey(), entry.getValue()));
				it.remove();
				i++;
			}

			/*
				Check all other words, if anyone more than least
				remove the least and add the new word.
			*/
			for (String key : wordMap.keySet()){
				if(pq.peek().getValue() < wordMap.get(key)){
					pq.poll();
					pq.add(new Pair<>(key, wordMap.get(key)));
				}
			}
			while(!pq.isEmpty()){
				topWords.add(pq.poll().getKey());
			}
		} catch (IOException e) {
			e.printStackTrace();
		}
		return topWords;
	}
	
	public static void main(String[] args){  
		System.out.println(mostFrequentWords("/home/sangar/Documents/test.txt", 3));
	}
}

Reading M words from file will require O(m) time, where as creating N element heap will take O(n). Also, scanning through all words and inserting them on to heap has complexity of O((m-n) log n). Overall complexity to find top n most frequent words in fileis O(m log m).


First non repeated character in stream

First non repeated character in stream

In last post, we discussed first non repeated character in string, where all characters where available in memory beforehand. Let’s take another variation of the problem : Given a stream of characters, find first non repeated character in stream in stat a given point of time.

For example, if stream till now is a,a,b,c,c,d; and it is asked what is first non repeated character at this point, it will be b. Next character in stream is let’s say b, then first non repeated character in stream at this point will be d.

Non repeated character in stream : thoughts

If set of characters is already present in memory, then it is very simple as we discussed here. Idea is to create a hash map with character as key and number of times is occurs in string as value; increase the count when character is seen during scan of string.
Once all characters are processed, go through string again and return first character which has count 1. However, this approach does not work with continuous stream of characters.

Why a queue?

Problem is to return first non repeating character at a given point of time. A character which was unique at a certain point of time, may not be unique at another point of time in future, that character may have be duplicated in stream coming in during those instances of time. We need to store characters in order of their arrival, however remove them as they occur more than once. At any specific time, we want the first arrived character till that point and has only one occurrence. Are you able to pattern?

First in first out pattern and queues are the best data structure to implement first in first out pattern.

One thing to be noted that even though first in first out pattern applies, removal of character is ad-hoc between two queries when stream flows in.  Characters which happen to occur twice or more between queries will be removed from the queue. Why mentioned this because then we cannot have array based queue. What data structure is best to remove a random node? Well, it’s doubly linked list. Given a node, it’s O(1) operation to remove it.

Why a hash map?

Another problem is, how can we map a character coming out of stream to a node in queue? We scan the doubly linked list based queue and delete the node which contains the character. Even though it will be O(1) operation (as max queue can grow 256 nodes why?), it is more expensive computationally to do for all the characters coming from stream.

What can quickly map a node to character, obviously a hash map. So, key value pair is store in hash map with character as key and node address as value.
Hash map actually now servers another purpose too, which is to maintain the count of occurrence. If a character is present in hash, that means it has corresponding node in queue, which implies that character has already been seen once. So we remove it from the queue if there is value for character in map.
If there is not value for the character in hash map, that implies, this is the first time we have seen this element and should be added to queue.

First non repeated character in stream : Algorithm

  1. Process next character from stream.
  2. Check in hash map if there is corresponding node in queue.
  3. If there is node already in hash map
    1. Delete the node corresponding node from queue
    2. go to step 1.
  4. If there is no node in hash map for this character
    1. Create a node and push that at the tail of queue.
    2. Update hash map with char, node address as key value pair

Algorithm looks fine, but there is an implementation problem here, can you see? We are freeing the node from queue however, we are keeping it in hash, to  indicate that character is already seen.
In ideal world, we should not store references to free memory which can be
accessed accidentally and cause segmentation fault. Idea would be to separate storing node pointer and seen functionalities with two hashes.

Create an additional hash map, character as key and bool as value, which indicates if character is seen twice already in stream before. With this hash map, we can remove the entry from other hash map which store node address of particular character, so no dangling pointers.

Modified algorithm

  1. Process next character from stream.
  2. Check in visited hash map if character was ever seen twice before.
  3. If true, go to step 1.
  4. If false, check the node address hash map, if there is node address for character
    1. Delete the node corresponding node from queue and remove entry from hash.
    2. Update visited hash map for this char as true.
    3. go to step 1.
  5. If there is no node in hash map for this character
    1. Create a node and push that at the tail of queue.
    2. Update hash map with char, node address as key value pair
package com.company;

import java.util.HashMap;

/**
 * Created by sangar on 8.10.18.
 */
public class NonRepeatedCharInStream {

    private HashMap<Character, Boolean> visited =  new HashMap<>();
    private HashMap<Character, ListNode> address =  new HashMap<>();
    private Queue<Character> queue = new QueueImplementation<>();

    public char getNonRepeatedCharacter(){
        return queue.isEmpty() ? ' ' : queue.peek().getData();
    }

    public void process(char c){
        if(visited.containsKey(c)) return;

        if(address.containsKey(c)){
            queue.deleteNode(address.get(c));
            address.remove(c);
            visited.put(c, true);
        }
        else{
            address.put(c,queue.enqueue(c));
        }
    }
}

Queue implementation using Doubly linked list

package com.company;

/**
 * Created by sangar on 8.10.18.
 */
public class QueueImplementation<E> implements Queue<E>{
    ListNode<E> head;
    ListNode<E> tail;

    public QueueImplementation(){
        head = null;
        tail = null;
    }

    @Override
    public ListNode<E> deleteNode(ListNode<E> node){
        if(this.isEmpty()) {
            return null;
        }

        if(this.head == node){
            if(this.head.getNext() != null) this.head.getNext().setPrev(null);
            this.head = this.head.getNext();
            return node;
        }

        if(this.tail == node){
            if(this.tail.getPrev() != null) this.tail.getPrev().setNext(null);
            this.tail = this.tail.getPrev();
            return node;
        }
        /*
            We are deleting node in between. So following things happen
            1. If node has prev, set node.prev.next = node.next.
            2. If node has next, set node.next.prev = node.prev
        */
        if(node.getPrev() != null) node.getPrev().setNext(node.getNext());
        if(node.getNext() != null) node.getNext().setPrev(node.getPrev());

        return node;
    }


    @Override
    public ListNode peek() {
        if(this.isEmpty()) {
            return null;
        }
        return this.head;
    }

    @Override
    public ListNode remove() {
        if(this.isEmpty()) {
            return null;
        }
        /*
            We are deleting node at head. So following things happen
            1. Set temporary node point to head.
            2. Move head to next of node.
            3. Set prev of new head to NULL.
            4. Free the temp node.
          */
        ListNode<E> tempNode = this.head;
        this.head.setPrev(null);
        this.head = this.head.getNext();

        return tempNode;
    }

    @Override
    public ListNode enqueue(E data) {
        if(this.isEmpty()) {
            this.head = new ListNode<E>(data);
            this.tail = this.head;
            return this.head;
        }
        ListNode<E> newNode = new ListNode<E>(data,null, this.tail);
        this.tail.setNext(newNode);
        this.tail = newNode;

        return newNode;
    }

    @Override
    public boolean isEmpty() {
        return this.head == null;
    }
}

package com.company;

/**
 * Created by sangar on 8.10.18.
 */
public interface Queue<E> {
    public ListNode<E> peek();
    public ListNode<E> remove();
    public ListNode<E> enqueue(E data);
    public ListNode<E> deleteNode(ListNode<E> node);
    public boolean isEmpty();
}

Test cases (add more and see if that works)

package test;

import com.company.NonRepeatedCharInStream;
import org.junit.jupiter.api.Test;

import static org.junit.jupiter.api.Assertions.assertEquals;

/**
 * Created by sangar on 23.9.18.
 */
public class NonRepeatedCharInStreamTest {

    NonRepeatedCharInStream tester = new NonRepeatedCharInStream();

    @Test
    public void testNormalCase() {

        tester.process('a');
        tester.process('b');
        tester.process('c');

        assertEquals('a', tester.getNonRepeatedCharacter());

        tester.process('a');
        assertEquals('b', tester.getNonRepeatedCharacter());

        tester.process('c');
        tester.process('d');
        tester.process('b');

        assertEquals('d', tester.getNonRepeatedCharacter());
    }

    @Test
    public void testNoUniqueCharCase() {

        tester.process('a');
        tester.process('b');
        tester.process('c');
        tester.process('a');
        tester.process('b');
        tester.process('c');

        assertEquals(' ', tester.getNonRepeatedCharacter());
    }
}

Let’s see how this algorithm works with an example. To start with both hash maps (visited and address) and queue are empty. Stream starts with character c, new node is added to queue, also, entry is entered into the address hash map to store address of node in queue.

first character added to queue and hash map

Next d comes from stream,  we check if it is on visited map, it is not, so we check if it on address map. It is not there, it means, we saw c first time. Add it to queue and store pointer to node in address node.

d is added to address map and queue

Next character is e, same process followed as d and it gets added to queue and address map.

Imagine that stream puts d out now. We will check in visited map, it is not, however, in address map, it is present. This is where we know that d has occurred twice. We delete it from queue and address map. However, we add it to visited map.

 first non repeated character at this moment is c 

Now comes the character f, it is not present in visited map nor in address map, so we add it to queue and store pointer in the address map.

Next comes the character d, we check visited map it true and so do not add it anywhere and skip it.

What if next character is c,  it is present in address map, so remove it from queue and map. Add it to visited map.

first non repeated character in stream
At this point of time, first non repeated character is e

Complexity to find first non repeated character in stream is O(1) as number of characters processed from stream does not impact the processing time. Space complexity is also O(1) as it independent of number of characters processed.

Please share if there is something wrong or missing. Please reach out to us at communications@algorithmsandme.com if you have doubt or need any help with your interview preparations.

LRU cache (Least Recently Used cache)

LRU cache implementation in Java

This is commonly asked question in interview, especially Microsoft and Amazon interviews. Problem statement is very simple

Implement LRU cache or Least Recently Used cache

Before going further into solution, first let’s understand what is cache?  In computer architectural terms, a cache is small buffer of pages OS maintains in-order to avoid more expensive main memory accesses.

Usually cache accesses are faster than main memory access, improving overall performance. Whenever process needs to access content from a specific memory location, it tries to search that page in cache first. If it finds page in cache, it uses it and does not access memory. It’s called cache hit.

However, caches are very small in size as compared to main memory, there is probability that page requested by process is not present in cache. In that case, new page is moved into cache and one of existing page is swapped out. When this happens, it is called cache miss.

Caching also happens at application layer too, for example, caching visited pages on browser, caching frequently accessed data from DB to in memory caches like Redis.

Based on how cache miss is handled in cache, there different type of caches like first in first out cache, least recently use cache, least frequently use cached.

In first in first out cache, in the event of cache miss, entity which came first into cache is evicted first; where as in least recently used cache, page which was used least recently gets evicted. In the same vain, least frequently used cache is where page which is least frequently used among all the pages in cache.

LRU cache implementation

Consider that you have a cache with space for additional page. If cache miss happens, we bring page from memory and put it in cache for future access.

Now, if cache is full, and cache miss happens, we have to bring in a new page and evict a page from cache. In LRU cache, this page will  be the page which accessed longest time ago.

What if a page was accessed at the start, and then accessed just before the cache miss? Is it the least recently used page? On the contrary, it is the most recently used page and hence should be last to evict. Question now is which page should we evict? In this case, page which came after the first page goes out.

If page is brought into cache first, it is first candidate for eviction, if it is not accessed again before cache miss happens.

Why queue?

In principle, LRU cache is first in first out cache with special case, that if a page is accessed again, it goes to end of eviction order. Which data structure is best to implement FIFO pattern? Of course it’s queue. So our LRU cache will be a queue where each node will store a page. This queue will have specific size as cache as limited size. Whenever a new page is brought in, it is added at the rear of queue. When eviction happens, it happens from the front of cache.

Why hash?

There is one requirement of LRU cache which does not map directly to queue data structure, which is to move a node corresponding recently accessed page to end of the queue. This poses two problems : First, how to find the node in the queue corresponding to page id being accessed? Second, how to move it to end as efficiently possible? Both problems are very similar to what we solved in first non repeated character in stream.

We will use hash which will store the node address in queue corresponding to page id. This will give us immediate access to the node to be reshuffled.

Why doubly linked list based queue?

Still problem remains to move nodes around with moving all elements of the queue. Which data structure removes an element in O(1), given the element? Doubly linked list it is. If we implement queue as doubly linked list, removing and adding pages from queue will be O(1) operation.

LRU cache algorithm

  • Cache miss happens :
    • If cache has free entry, enqueue page to queue.
    • If cache is full,  remove the page from from of queue and add new page at the end of queue.
  • Cache hit happens :
    • delete node from current location in queue.
    • Enqueue page at the end of queue.
  • If page is present in hash, it’s a cache hit, if page is not present in hash map, it’s a cache miss.

LRU cache implementation

Queue interface

package com.company;

/**
 * Created by sangar on 8.10.18.
 */
public interface Queue<E> {
    public ListNode<E> peek();
    public ListNode<E> remove();
    public ListNode<E> enqueue(E data);
    public ListNode<E> deleteNode(ListNode<E> node);
    public boolean isEmpty();
    public int size();
}

Queue implementation

package com.company;

/**
 * Created by sangar on 8.10.18.
 */
public class QueueImplementation<E> implements Queue<E>{
    ListNode<E> head;
    ListNode<E> tail;
    int size;

    public QueueImplementation(){
        head = null;
        tail = null;
        this.size = 0;
    }

    @Override
    public ListNode<E> deleteNode(ListNode<E> node){
        if(this.isEmpty()) {
            return null;
        }

        if(this.head == node){
            if(this.head.getNext() != null) this.head.getNext().setPrev(null);
            this.head = this.head.getNext();
            this.size--;
            return node;
        }

        if(this.tail == node){
            if(this.tail.getPrev() != null) this.tail.getPrev().setNext(null);
            this.tail = this.tail.getPrev();
            this.size--;
            return node;
        }
        /*
            We are deleting node in between. So following things happen
            1. If node has prev, set node.prev.next = node.next.
            2. If node has next, set node.next.prev = node.prev
        */
        if(node.getPrev() != null) node.getPrev().setNext(node.getNext());
        if(node.getNext() != null) node.getNext().setPrev(node.getPrev());

        this.size--;
        return node;
    }


    @Override
    public ListNode peek() {
        if(this.isEmpty()) {
            return null;
        }
        return this.head;
    }

    @Override
    public ListNode remove() {
        if(this.isEmpty()) {
            return null;
        }
        /*
            We are deleting node at head. So following things happen
            1. Set temporary node point to head.
            2. Move head to next of node.
            3. Set prev of new head to NULL.
            4. Free the temp node.
          */
        ListNode<E> tempNode = this.head;
        this.head.setPrev(null);
        this.head = this.head.getNext();

        this.size--;
        return tempNode;
    }

    @Override
    public ListNode enqueue(E data) {
        if(this.isEmpty()) {
            this.head = new ListNode<E>(data);
            this.tail = this.head;
            this.size++;
            return this.head;
        }
        ListNode<E> newNode = new ListNode<E>(data,null, this.tail);
        this.tail.setNext(newNode);
        this.tail = newNode;

        this.size++;
        return newNode;
    }

    @Override
    public boolean isEmpty() {
        return this.head == null;
    }

    @Override
    public int size() {
        return this.size;
    }
}

LRU cache implementation in java

package com.company;

import java.util.ArrayList;
import java.util.HashMap;

/**
 * Created by sangar on 9.10.18.
 */
public class LRUCache {
    private Queue<Integer> queue;
    private HashMap<Integer, ListNode> pages;
    private int cacheSize;

    public LRUCache(int cacheSize){
        this.cacheSize = cacheSize;
        queue = new QueueImplementation<>();
        pages = new HashMap<>();
    }

    /* This function implements the LRU cache */
    public void readPage(int pageId) {

        /*Cache Miss and we can add the page in the cache */
        if (!pages.containsKey(pageId) && queue.size() < cacheSize) {
            pages.put(pageId, queue.enqueue(pageId));
            return;
        }

        /* Cache Miss and we cannot add new page to cache,
        remove the LRU page */
        if (!pages.containsKey(pageId) && queue.size() >= cacheSize) {
            //First remove it from the queue.
            ListNode evictedPage = queue.remove();
            //Remove node from hash table
            pages.remove(evictedPage.getData());
            //Enqueue new node and add it at tail
            queue.enqueue(pageId);
            return;
        }

        /* Cache is Hit */
        if (pages.containsKey(pageId)) {
            updateCache(pageId);
        }

        return;
    }

    /* This function modifies queue when there is cache hit */
    public void updateCache(int pageId){

        /* Case where queue may be empty - defensive programing*/
        if(queue.isEmpty() && queue.size() < cacheSize){
            pages.put(pageId,queue.enqueue(pageId));
        }
        /* Update the pointers of neighbor nodes and tail in dummy node */
        else{
            ListNode node = queue.deleteNode(pages.get(pageId));
            if(node != null){
                queue.enqueue((Integer)node.getData());
            }
        }
    }

    public ArrayList<Integer> cacheState(){
        ListNode current = queue.peek();
        ArrayList<Integer> cacheState = new ArrayList<>();
        while(current != null){
            cacheState.add((Integer)current.getData());
            current = current.getNext();
        }
        return cacheState;
    }
}

Test case for LRU cache implementation

package test;

import com.company.LRUCache;
import org.junit.jupiter.api.Test;

import java.util.ArrayList;
import java.util.Arrays;

import static org.junit.jupiter.api.Assertions.assertEquals;

/**
 * Created by sangar on 23.9.18.
 */
public class LRUCacheTest {

    LRUCache tester = new LRUCache(5);

    @Test
    public void testCacheInsertion() {

        ArrayList<Integer> cacheState = new ArrayList<>(Arrays.asList(1,2,3,4,5));

        tester.readPage(1);
        tester.readPage(2);
        tester.readPage(3);
        tester.readPage(4);
        tester.readPage(5);

        assertEquals(cacheState, tester.cacheState());
    }

    @Test
    public void testCacheHit() {

        ArrayList<Integer> cacheState = new ArrayList<>(Arrays.asList(1,4,5,2,3));

        tester.readPage(1);
        tester.readPage(2);
        tester.readPage(3);
        tester.readPage(4);
        tester.readPage(5);
        tester.readPage(2);
        tester.readPage(3);

        assertEquals(cacheState, tester.cacheState());
    }

    @Test
    public void testCacheMiss() {

        ArrayList<Integer> cacheState = new ArrayList<>(Arrays.asList(2,3,4,5,6));

        tester.readPage(1);
        tester.readPage(2);
        tester.readPage(3);
        tester.readPage(4);
        tester.readPage(5);
        tester.readPage(6);

        assertEquals(cacheState, tester.cacheState());
    }

    @Test
    public void testEvictionAndInsertion() {

        ArrayList<Integer> cacheState = new ArrayList<>(Arrays.asList(3,4,5,6,1));

        tester.readPage(1);
        tester.readPage(2);
        tester.readPage(3);
        tester.readPage(4);
        tester.readPage(5);
        tester.readPage(6);
        tester.readPage(1);

        assertEquals(cacheState, tester.cacheState());
    }


    @Test
    public void testEmptyCache() {
        ArrayList<Integer> cacheState = new ArrayList<>();

        assertEquals(cacheState, tester.cacheState());
    }
}

Let’s take an example and see how whole thing works. Let’s say we have cache of size 5, which is empty to start with. Application accesses page id 1,2,3,4 and 5. As they are all cache miss to start with and there is space in cache, all these pages are brought into cache.

implement lru cache in java
Cache state after reading first 5 pages

Now, application accesses page 6. We have a cache miss. What happens? As cache is full, we evict the page which is at the front and add page 6 to it.

Page 1 is evicted, and page 6 added to queue

Application accesses page 2 again, which is already present in cache, it’s a cache hit. As page 2 is most recently used page, it has to go to end of the queue.

implement lru cache
Page 2 moves at the end

Let’s say next page accessed is page 7, it is cache miss. Evict a page from cache, which is the first node in queue (3).

lru cache implementation
New page 7 added to end after evicting page 3 from front

Insertion and removal from queue is O(1) complexity.

Please share if there is something wrong or missing. If you are interested in taking personal coaching from our expert teachers, please contact us at communications@algorithmsandme.com