Lecture 36 - Caches III

Published

May 11, 2025

Goals

  • learn about replacement policies
  • learn how we can keep the cache in sync with memory

More Cache mechanics

Replacement policy

For both associative and set associative caches, we need to have a policy for picking which line gets cleared when we need to load a new block in. There are a collection of different approaches.

least recently used (LRU)
This is the approach that you had to implement for project 1. We could implement this with age bits or a little stack of pointers.

This is a pretty sensible approach – get rid of the thing that we haven’t used in a while. of course, is a worst case scenario, like a longer loop, it is possible that we clear it just before we get back to needing it…

pseudo-LRU (PLRU) When we get more lines, like 8 or 16, then the hardware to maintain the LRU gets more expensive. We maintain a tree of bits, and each bit indicates which half of the cache is old. When we traverse the tree, we flip the bits as we go. This can be wrong, but is fast and close enough most of the time.

first-in first-out (FIFO)
In terms of implementation this one is the easiest. We just need a round robin scheme that points points to the next line to replace and we can cycle through in order.

This policy also makes sense – get rid of the old thing. of course we can always come up with a worst case scenario about why that is a bad idea.

least frequently used (LFU)
We could count references to a block and get rid of the one with the fewest references.

From a logical standpoint, this makes sense. Being in the cache a long time doesn’t mean we aren’t using it a lot. We would rather get rid of the thing we aren’t using much.

There are two downsides. First, we need to maintain a counter for references and it has to be big enough to not overflow too quickly. Second, a block may have very few references because it just got added to the cache recently.

Re-reference interval prediction (RRIP) Every line gets a (typically) two bit counter, which gives us a short range from near (0, we will will use this soon) to far (3, good candidate to be evicted).

New data enters as a 2. If we reuse something, it is promoted to a 0.

When we need to evict something, look for a 3. If there are no 3s, increment all counters until we have a 3 and then evict one of those.

random We could just pick a line at random. Surprisingly, in simulation this has pretty similar performance to the other ones – it is only just slightly worse. Of course, it can’t be actually random, but there are several solutions, including a global counter with n bits (for a \(2^n\) line cache). We increment it every clock cycle and when we have a cache miss, we evict the line matching the counter. We could also create a small pseudorandom number generator,

Modern processors will typically use PLRU or RRIP

Keeping in sync

So far, we have just been thinking about reading data out of memory. What about writing values back into memory?

If we just write back into the cache, the cache and memory will get out of sync. This is a problem because the processor may not be the only thing using the memory.

We have a variety of techniques

write through When we need to write a value, we update the cache and the memory at the same time.

Very easy conceptually, but it has rubbish performance because writing back to memory takes so long. it also isn’t very bright. What if we have a collection of writes to the same address in quick succession? We waste a lot of time writing values back only to change them again immediately.

write buffer We add a buffer to memory so we can kick off the read and then go back to work on the processor. This allows us to make changes and can queue up a collection of writes, depending on the buffer size.

The downside is that if the buffer is full, we have to stall the processor. but if we are going to produce writes more quickly than the memory can take them, there is not much we can do about that.

write-back This is the opposite of write through. We let memory and the cache get out of sync and then when the cache line needs to be evicted, we write any changes back to memory. The downside of this is that memory does get a little out of sync, and it makes cache misses more expensive.

Mechanical level

vocabulary

Skills