Implement a Least Recently Used (LRU) cache and execute a given sequence of SET(key, value) and GET(key) operations. Return the results of all the GET operations from the sequence.
Initially the cache is empty. Cache capacity is a part of the input. Keys and values are all positive. GET(key) returns either the cached value or -1 (cache miss). SET(key, value) adds or updates the value for the key. If cache is at capacity and a new value needs to be added, remove the least recently used (LRU) value first.
Example
Input:
capacity = 2
query_type = [1, 1, 0, 1, 0, 1, 0]
key = [5, 10, 5, 15, 10, 5, 5]
value = [11, 22, 1, 33, 1, 55, 1]
Output: [11, -1, 55]
query_type of 0 means GET and query_type of 1 means SET.
Here are the operations from the input along with the return values and side effects of their execution:
Operation Cache after Ret. val. Side effect
SET(5, 11) [5->11] 5->11 added to cache
SET(10, 22) [10->22, 5->11] 5->11 became LRU
GET(5) [5->11, 10->22] 11 10->22 became LRU
SET(15, 33) [15->33, 5->11] 10->22 got evicted
GET(10) [15->33, 5->11] -1 Access order unchanged
SET(5, 55) [5->55, 15->33] Key 5 updated, became the most recently used (MRU)
GET(5) [5->55, 15->33] 55 No change; key 5 already was the MRU
The function accepts four arguments:
1. The cache capacity,
2. query_type array with 0 for GET and 1 for SET operation,
3. key array with the keys for all the operations,
4. value array with the values for SET operations (value to be ignored for GETs).
The three input arrays all have the same length n; together they represent n operations.
Output: Return an array with return values of all the GET operations.
● 1
● 1
● 1
● 1
To solve this problem we need to come up with a data structure that allows for efficient implementations of 1) lookup value by key 2) add/update value for a key and 3) evict the LRU element when the cache is at capacity and an element has to be added.
To evict elements correctly we always need to know which element is the least recently used (LRU), for that we somehow need to store the complete order in which the elements were accessed (GET or SET). When an element is looked up, added or updated, it becomes the most recently used (MRU), it would move to the front end of the order. When we need to evict, we’d evict the one at the back end.
We either need one structure to efficiently do both “looking up” and “ordering” or one for looking up and another for ordering.
1. Dynamic arrays (vector, arraylist, etc) take O(n) time to look up and O(n) to change order (because to move an element from the middle to the front end, O(n) elements have to be shifted by one position). While O(n) is inefficient asymptotically, real world performance can be good for small enough n due to compactness of the data structure in memory that results in CPU cache friendliness (covered in the Search class).
2. Linked list takes O(n) time to look up but changing order can be implemented efficiently if somehow we don’t have to look up an element first: if we have a pointer to a node in the middle of a doubly linked list (as well as pointers to its head and tail) we can move that node from the middle to the front end of the list in constant time by rearranging pointers.
3. Min heap, no matter how we build it, takes O(n) to look up an element. But if we use the “last use timestamp” (and not the cache key) as the key to build the heap then changing order takes O(log n) and the LRU element is always at the root of the heap, easily accessible for eviction when necessary. Space efficiency and cache friendliness (covered in the Search class) of the array-based heap implementation can make its real world performance comparable with that of linked list or other less cache friendly structures despite the asymptotic complexity being O(log n) vs. O(1).
4. Hashmap takes O(1) to look up and doesn’t help in maintaining the order.
5. Tree based (ordered) map/dictionary may seem promising at first: one such map can do efficient O(log n) lookups while maintaining ordering. However in order to use such a map for cache value lookups, it must map the cache key->value -- but ordering according to that key is meaningless to us. We only care about the “last use timestamp” ordering. We could use one ordered map for mapping (key->value) and another one for ordering (timestamp->key); then each operation would take O(log n) time. But then—why not just use hashmap and linked list instead (each operation O(1)), right?
We provided three solutions. Two use a hashmap+linked list (with two different implementations of the linked list) and one uses a hashmap+heap.
These solutions use a hashmap for efficient cache lookups and a doubly linked list to keep track of the access order. One implements the doubly linked list from scratch and the other uses the C++ std::list implementation. In both implementations the linked list simply stores the {key, value} pairs ordered from the MRU to the LRU. The hashmap maps the cache keys to the pointers to linked list nodes.
SET(key, value) operation:
● Look up the key in the hashmap.
● If found, we have a pointer to the linked list node.
○ Update the value in that node.
○ Then move the node from its current position in the list to the front, so it becomes the MRU.
● If not found:
○ If (and only if) the cache is at capacity, access the tail of the linked list (the LRU element) and delete it both from the list and from the hashmap.
○ Then add the {key, value} to the front of the list and to the hashmap.
GET(key) operation:
● Look up the key in the hashmap.
● If not found, return -1.
● If found, move the linked list node to the front (the hashmap still has the pointer to that node object; nothing needs to change there). Return the value found in the node.
O(n).
Each operation, GET or SET, takes constant time, and there are n operations.
O(min(capacity, number of SET operations)) = O(min(capacity, n)).
Each SET operation increases the size of the hashmap and the list, until we are at capacity. On the other hand, if capacity > n, the capacity won’t ever be reached as we only allocate space for actually added cache entries.
O(n).
Input takes O(n) space.
Auxiliary space used is O(min(capacity, n)).
Output takes O(number of GET operations) = O(n) space.
Summing up the three we get the total space complexity:
O(n) + O(min(capacity, n)) + O(n) = O(n).
This solution also uses a hashmap for efficient cache lookups. For keeping track of the access order we use a simple integer counter as a timestamp; we build and maintain a heap with that timestamp as a key. The C++ STL heap implementation (std::push_heap, std::pop_heap) is a max heap by default and we need the LRU element to be at the root of the heap; to achieve that we decrement the timestamp for every GET or SET operation instead of incrementing it. That way the LRU element has the greatest timestamp of all, so it ends up at the root of the max heap.
The heap stores {timestamp, key} pairs and the hashmap stores (key -> {timestamp, value}).
O(n * log(capacity)).
Per one GET/SET operation, this solution executes a constant number of insert/delete operations on the max heap (O(log(capacity)) each) and a constant number of lookup/insert/update/delete operations on the hashmap (constant time each). Given that we have a total of n GET/SET operations, overall time complexity is n*O(log(capacity) = O(n*log(capacity).
Here is one example that may help understand this solution and its time complexity. It is the worst case for n=15:
capacity = 7
query_type = [1 1 1 1 1 1 1 0 0 0 0 0 0 0 1]
key = [1 2 3 4 5 6 7 1 2 3 4 5 6 7 8]
value = [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
First 7 SET operations add 7 key-value pairs to the cache. After that the cache is at capacity.
Next 7 GET operations access every cache entry, so the timestamps in the hashmap (but not in the heap) change.
Last operation, SET(8, 15), adds a new key-value; it needs to evict the LRU element from the cache. The while(true) loop will run 7 times, each run taking log(capacity)=log(7) time.
O(min(capacity, number of SET operations)) = O(min(capacity, n)).
Each SET operation increases the size of the hashmap and the heap, until we are at capacity. On the other hand, if capacity > n, the capacity won’t ever be reached as we only allocate space for actually added cache entries.
O(n).
Input takes O(n) space.
Auxiliary space used is O(min(capacity, n)).
Output takes O(number of GET operations) = O(n) space.
Summing up the three we get the total space complexity:
O(n) + O(min(capacity, n)) + O(n) = O(n).
As mentioned above, the heap based solution will have running time comparable to the linked list based solution on the modern CPUs despite having the asymptotic time complexity of O(n*log(capacity)) vs. O(n). That is especially true for small enough capacity and n, such as