# LRU Cache Problem

#### Design And Implement LRU Cache

Implement a Least Recently Used (LRU) cache and execute a given sequence of SET(key, value) and GET(key) operations. Return the results of all the GET operations from the sequence.

Initially the cache is empty. Cache capacity is a part of the input. Keys and values are all positive. GET(key) returns either the cached value or -1 (cache miss). SET(key, value) adds or updates the value for the key. If cache is at capacity and a new value needs to be added, remove the least recently used (LRU) value first.

Example

Input:

capacity = 2

query_type = [1,  1,  0,  1,   0,  1, 0]

key = [5, 10, 5, 15, 10, 5, 5]

value = [11, 22, 1, 33, 1, 55, 1]

Output: [11, -1, 55]

query_type of 0 means GET and query_type of 1 means SET.

Here are the operations from the input along with the return values and side effects of their execution:

Operation Cache after Ret. val. Side effect

SET(5, 11) [5->11] 5->11 added to cache

SET(10, 22) [10->22, 5->11] 5->11 became LRU

GET(5) [5->11, 10->22] 11 10->22 became LRU

SET(15, 33) [15->33, 5->11] 10->22 got evicted

GET(10) [15->33, 5->11] -1 Access order unchanged

SET(5, 55) [5->55, 15->33] Key 5 updated, became the most recently used (MRU)

GET(5) [5->55, 15->33] 55 No change; key 5 already was the MRU

##### Notes:

The function accepts four arguments:

1. The cache capacity,

2. query_type array with 0 for GET and 1 for SET operation,

3. key array with the keys for all the operations,

4. value array with the values for SET operations (value to be ignored for GETs).

The three input arrays all have the same length n; together they represent n operations.

Output: Return an array with return values of all the GET operations.

● 1

● 1

● 1

● 1

#### Solution

To solve this problem we need to come up with a data structure that allows for efficient implementations of 1) lookup value by key 2) add/update value for a key and 3) evict the LRU element when the cache is at capacity and an element has to be added.

To evict elements correctly we always need to know which element is the least recently used (LRU), for that we somehow need to store the complete order in which the elements were accessed (GET or SET). When an element is looked up, added or updated, it becomes the most recently used (MRU), it would move to the front end of the order. When we need to evict, we’d evict the one at the back end.

We either need one structure to efficiently do both “looking up” and “ordering” or one for looking up and another for ordering.

##### Some Data Structures to Consider:

1. Dynamic arrays (vector, arraylist, etc) take O(n) time to look up and O(n) to change order (because to move an element from the middle to the front end, O(n) elements have to be shifted by one position). While O(n) is inefficient asymptotically, real world performance can be good for small enough n due to compactness of the data structure in memory that results in CPU cache friendliness (covered in the Search class).

2. Linked list takes O(n) time to look up but changing order can be implemented efficiently if somehow we don’t have to look up an element first: if we have a pointer to a node in the middle of a doubly linked list (as well as pointers to its head and tail) we can move that node from the middle to the front end of the list in constant time by rearranging pointers.

3. Min heap, no matter how we build it, takes O(n) to look up an element. But if we use the “last use timestamp” (and not the cache key) as the key to build the heap then changing order takes O(log n) and the LRU element is always at the root of the heap, easily accessible for eviction when necessary. Space efficiency and cache friendliness (covered in the Search class) of the array-based heap implementation can make its real world performance comparable with that of linked list or other less cache friendly structures despite the asymptotic complexity being O(log n)  vs. O(1).

4. Hashmap takes O(1) to look up and doesn’t help in maintaining the order.

5. Tree based (ordered) map/dictionary may seem promising at first: one such map can do efficient O(log n) lookups while maintaining ordering. However in order to use such a map for cache value lookups, it must map the cache key->value -- but ordering according to that key is meaningless to us. We only care about the “last use timestamp” ordering. We could use one ordered map for mapping (key->value) and another one for ordering (timestamp->key); then each operation would take O(log n) time. But then—why not just use hashmap and linked list instead (each operation O(1)), right?

We provided three solutions. Two use a hashmap+linked list (with two different implementations of the linked list) and one uses a hashmap+heap.

These solutions use a hashmap for efficient cache lookups and a doubly linked list to keep track of the access order. One implements the doubly linked list from scratch and the other uses the C++ std::list implementation. In both implementations the linked list simply stores the {key, value} pairs ordered from the MRU to the LRU. The hashmap maps the cache keys to the pointers to linked list nodes.

SET(key, value) operation:

● Look up the key in the hashmap.

● If found, we have a pointer to the linked list node.

○ Update the value in that node.

○ Then move the node from its current position in the list to the front, so it becomes the MRU.

○ If (and only if) the cache is at capacity, access the tail of the linked list (the LRU element) and delete it both from the list and from the hashmap.

○ Then add the {key, value} to the front of the list and to the hashmap.

GET(key) operation:

● Look up the key in the hashmap.

● If found, move the linked list node to the front (the hashmap still has the pointer to that node object; nothing needs to change there). Return the value found in the node.

##### Time Complexity:

O(n).

Each operation, GET or SET, takes constant time, and there are n operations.

##### Auxiliary Space Used:

O(min(capacity, number of SET operations)) = O(min(capacity, n)).

Each SET operation increases the size of the hashmap and the list, until we are at capacity. On the other hand, if capacity > n, the capacity won’t ever be reached as we only allocate space for actually added cache entries.

##### Space Complexity:

O(n).

Input takes O(n) space.

Auxiliary space used is O(min(capacity, n)).

Output takes O(number of GET operations) = O(n) space.

Summing up the three we get the total space complexity:

O(n) + O(min(capacity, n)) + O(n) = O(n).

``````
class LRU_cache
{
int capacity;
class ListNode
{
public:
int key;
int value;
ListNode *prev;
ListNode *next;

ListNode(int _key = 0, int _value = 0)
{
key = _key;
value = _value;
prev = NULL;
next = NULL;
}
};
// Linked list whose nodes store {key, value}.
ListNode *head = NULL, *tail = NULL;
// key -> pointer-to-ListNode.
unordered_map key_to_actual_storage_mapping;

public:
LRU_cache(int _capacity)
{
capacity = _capacity;
}

/*
Also add the mapping (key -> pointer to where it is stored in linked list).
*/
{
ListNode *temp = new ListNode(key, value);
{
// The list was empty; now it should have one element.
tail = temp;
}
else
{
}
}

/*
Remove one {key, value} from the tail of the linked list.
Also remove the mapping (key -> pointer to where it is stored in the linked list).
*/
void remove_least_recently_used()
{
int key = tail->key;
key_to_actual_storage_mapping.erase(key);
{
// The list had one element; let's make it an empty list.
delete tail;
}
else
{
tail = tail->prev;
delete tail->next;
tail->next = NULL;
}
}

// Remove given node from the linked list.
void erase_node(ListNode *cur_node)
{
ListNode *prev_node = cur_node->prev;
ListNode *next_node = cur_node->next;
// Connect previous node with next node.
if (prev_node != NULL)
{
prev_node->next = next_node;
}
if (next_node != NULL)
{
next_node->prev = prev_node;
}

{
// One element list becomes empty.
}
{
}
else if (tail == cur_node)
{
tail = prev_node;
}

delete cur_node;
}

// If a value for the given key is cached, makes it the MRU and returns it; else returns -1.
int get(int key)
{
unordered_map::iterator it = key_to_actual_storage_mapping.find(key);

if (it == key_to_actual_storage_mapping.end())
{
// Key isn't found.
return -1;
}
// it->second points to node in the linked list; let's get its value.
int value = it->second->value;
// Remove from the original position.
erase_node(it->second);
// Add to the front of the list.
return value;
}

// Evicts the LRU cached value if necessary to avoid exceeding capacity of the cache.
// Makes the newly added/updated cache entry the MRU one.
void set(int key, int value)
{
unordered_map::iterator it = key_to_actual_storage_mapping.find(key);
if (it == key_to_actual_storage_mapping.end())
{
// The key isn't found in the cache; so it needs to be added.
if (key_to_actual_storage_mapping.size() == capacity)
{
// Cache is at capacity. Let's evict the LRU element.
remove_least_recently_used();
}
// Add the key->value to the cache.
}
else
{
// Key is found in the cache; so we need to update the value and make it the MRU.
// Remove it from the original position in the list.
erase_node(it->second);
// Add to the front of the list and to the hashmap.
}
}
}; // class LRU_cache

vector implement_LRU_cache(int capacity, vector query_type, vector key,
vector value)
{
int n = query_type.size();
LRU_cache* cache = new LRU_cache(capacity);
vector ans;
for (int i = 0; i < n; i++)
{
if (query_type[i] == 0)
{
ans.push_back(cache->get(key[i]));
}
else
{
cache->set(key[i], value[i]);
}
}
return ans;
}
``````

#### 2) hashmap_max_heap_solution.cpp

This solution also uses a hashmap for efficient cache lookups. For keeping track of the access order we use a simple integer counter as a timestamp; we build and maintain a heap with that timestamp as a key. The C++ STL heap implementation (std::push_heap, std::pop_heap) is a max heap by default and we need the LRU element to be at the root of the heap; to achieve that we decrement the timestamp for every GET or SET operation instead of incrementing it. That way the LRU element has the greatest timestamp of all, so it ends up at the root of the max heap.

The heap stores {timestamp, key} pairs and the hashmap stores (key -> {timestamp, value}).

##### Time Complexity:

O(n * log(capacity)).

Per one GET/SET operation, this solution executes a constant number of insert/delete operations on the max heap (O(log(capacity)) each) and a constant number of lookup/insert/update/delete operations on the hashmap (constant time each). Given that we have a total of n GET/SET operations, overall time complexity is n*O(log(capacity) = O(n*log(capacity).

Here is one example that may help understand this solution and its time complexity. It is the worst case for n=15:

capacity = 7

query_type = [1 1 1 1 1 1 1 0 0 0 0 0 0 0 1]

key = [1 2 3 4 5 6 7 1 2 3 4 5 6 7 8]

value = [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]

First 7 SET operations add 7 key-value pairs to the cache. After that the cache is at capacity.

Next 7 GET operations access every cache entry, so the timestamps in the hashmap (but not in the heap) change.

Last operation, SET(8, 15), adds a new key-value; it needs to evict the LRU element from the cache. The while(true) loop will run 7 times, each run taking log(capacity)=log(7) time.

##### Auxiliary Space Used:

O(min(capacity, number of SET operations)) = O(min(capacity, n)).

Each SET operation increases the size of the hashmap and the heap, until we are at capacity. On the other hand, if capacity > n, the capacity won’t ever be reached as we only allocate space for actually added cache entries.

##### Space Complexity:

O(n).

Input takes O(n) space.

Auxiliary space used is O(min(capacity, n)).

Output takes O(number of GET operations) = O(n) space.

Summing up the three we get the total space complexity:

O(n) + O(min(capacity, n)) + O(n) = O(n).

##### Note on the real world performance:

As mentioned above, the heap based solution will have running time comparable to the linked list based solution on the modern CPUs despite having the asymptotic time complexity of O(n*log(capacity)) vs. O(n). That is especially true for small enough capacity and n, such as

``````

class LRU_cache
{
// Maximum size of the cache.
int capacity = 0;
int timestamp = 0;
// Heap storing {timestamp, key} pairs. timestamp being the key for building the heap.
vector> lru_heap;
// hashmap {key -> {timestamp, value}}.
unordered_map> cache;

public:

LRU_cache(int _capacity)
{
capacity = _capacity;
}

// If a value for the given key is cached, makes it the MRU and returns it; else returns -1.
int get(int key)
{
if(cache.find(key) != cache.end())
{
// Key is found in the cache; let's update its access timestamp and keep the value unchanged.
cache[key].first = timestamp--;
// We don't update the timestamp of this element in the heap, allowing it become outdated.
// We will take care of that when (and if) we need to evict the LRU element.
return cache[key].second; // Return the cached value.
}
else
{
return -1;
}
}

// Evicts the LRU cached value if necessary to avoid exceeding capacity of the cache.
// Makes the newly added/updated cache entry the MRU one.
void set(int key, int value)
{
if(cache.find(key) != cache.end())
{
// Key is found in the cache; let's update the hashmap with the new value and timestamp.
cache[key] = {timestamp--, value};
// We don't update the timestamp of this element in the heap, allowing it become outdated.
// We will take care of that when (and if) we need to evict the LRU element.
return;
}
// We need to add an element. If the cache is full, we need to evict the LRU element.
if(lru_heap.size() == capacity)
{
// Loop until we have removed _one_ element from cache.
while(true)
{
// Root of the heap is the candidate element for eviction because it
// is the LRU according to the timestamps of the heap.

// Remove the root element from the heap.
pop_heap(lru_heap.begin(), lru_heap.end());
// The removed element is at the back of vector now:
auto& candidate = lru_heap.back();
// Check if the timestamp of the element in the heap matches the timestamp
// of the same element in the hashmap.
if(cache[candidate.second].first == candidate.first)
{
// The timestamp in the heap is up-to-date -> this is really the LRU element.
// Remove it from the hashmap.
cache.erase(candidate.second);
// Complete removing it from the heap by removing it from the underlying vector.
lru_heap.pop_back();
break; // Exit the while(true) loop.
}
else
{
// The candidate element was used later than the heap "knows",
// so it might not be the LRU. Let us push it back to the heap
// now with the up-to-date timestamp (from the hashmap).
candidate.first = cache[candidate.second].first;
push_heap(lru_heap.begin(), lru_heap.end());
// Next iteration of the while(true) loop will try the new root of the heap.
}
}
}
// Add the new key-value pair into the hashmap and the heap with the appropriate timestamp.
cache[key] = {timestamp, value};
lru_heap.push_back({timestamp, key});
push_heap(lru_heap.begin(), lru_heap.end());
timestamp--;
}
}; // class LRU_cache

vector implement_LRU_cache(int capacity, vector query_type, vector key,
vector value)
{
int n = query_type.size();
LRU_cache* cache = new LRU_cache(capacity);
vector ans;
for (int i = 0; i < n; i++)
{
if (query_type[i] == 0)
{
ans.push_back(cache->get(key[i]));
}
else
{
cache->set(key[i], value[i]);
}
}
return ans;
}

``````

All Posts