Lecture 35 - Caches II

Goals

learn about associative and set-associative mapping for caches
learn about replacement policies
learn how we can keep the cache in sync with memory

Cache mechanics

Last time we started to go into the details of cache mechanics, trying to answer these questions:

if we have a memory address, how can we find it and tell if it is resident in the cache?
If we have a cache miss and need to load a new block in, where does it go?
if the cache is full, how do we choose which block has to leave?
how do we keep the cache and memory in sync?

direct mapped cache

We looked at the direct mapped cache, which maps every address to a single line in the cache.

| tag | block address | block offset |

block offset - this tells us where in the block we will find the byte referred to by this address
block address - these bits will tell us which line of the cache we will find this block
tag - the rest of the bits provide a unique label for this block. We will save this in the cache with the data so we can tell which block is resident

Direct mapping is easy to think about and to implement, but it is subject to thrashing. We get thrashing when we have two memory locations that are within our working set that both map to the same cache line. The rest of the cache could be empty, but we have to keep switching out the same cache line. This is very bad because this performance would be far worse than if we didn't have the cache at all.

Associative mapping

Associative mapping takes the opposite approach to direct mapping. A block can be put into any line in the cache.

Breaking up the address is now a little easier

| tag | block offset |

Of course this creates some new issues

to find the address, we need to do a comparison to the stored tags across all of the lines of the cache. This isn't a big deal for our toy four line cache, but as they get up to more reasonable sizes, this is a large chunk of logic
we also will need to come up with placement and replacement policies to figure out where to add a new block, and which one to evict

Set associative mapping

Set associative mapping is a compromise between the two approaches

The cache is divided up into sets, each of which uses associative mapping.

We will divide up the address in a way very similar to the direct mapped approach

| tag | set | block offset |

The difference is that the set doesn't tell us the exact line, it just gets us close (into the set). From there, we will look at the tags to try to figure out if our line is present.

example: return to our 4 bit address and four line cache. This time, we divide the cache up into two sets

we only need 1 bit to identify the set
we will still use 1 bit for the block offset
now the tag is two bits
to find the address, we go the the right set, and then we do a comparison across all of the lines in the set to see if our block is in residence
if it isn't, we can load it into either of the lines (we will need a replacement policy to figure out which one if they are both filled)

The set associate cache is actually the most general form

if the number of lines per set is 1, then we have direct mapping
if the number of sets is 1, then we have associative mapping

The typical approaches are 2,4,8, and sometimes 16-way mapping. This refers to the number of lines per set. 8 is generally the most common size these days.

Replacement policy

For both associative and set associative caches, we need to have a policy for picking which line gets cleared when we need to load a new block in. We have four approaches

least recently used
This is the approach that you had to implement for project 1. However, this will not use the shift technique. We can just use a counter. For a 2-way set associative cache, this could be a single bit. When we access a line in the set, we assert the line's "use" bit and clear the other line's bit. Things get a little more complex when we have a larger number of lines.

This is a pretty sensible approach -- get rid of the thing that we haven't used in a while. of course, is a worst case scenario, like a longer loop, it is possible that we clear it just before we get back to needing it...

first-in first-out
In terms of implementation this one is the easiest. We just need a round robin scheme that points points to the next line to replace and we can cycle through in order.

This policy also makes sense -- get rid of the old thing. of course we can always come up with a worst case scenario about why that is a bad idea.

least frequently used
We could count references to a block and get rid of the one with the fewest references.

From a logical standpoint, this makes sense. Being in the cache a long time doesn't mean we aren't using it a lot. We would rather get rid of the thing we aren't using much.

There are two downsides. First, we need to maintain a counter for references and it has to be big enough to not overflow too quickly. Second, a block may have very few references because it just got added to the cache recently.

random We could just pick a line at random. Surprisingly, in simulation this has pretty similar performance to the other ones -- it is only just slightly worse. Also surprisingly, this is the most difficult to implement because we would need to come up with a hardware level random number generator.

Keeping in sync

So far, we have just been thinking about reading data out of memory. What about writing values back into memory?

If we just write back into the cache, the cache and memory will get out of sync. This is a problem because the processor may not be the only thing using the memory.

We have a variety of techniques

write through When we need to write a value, we update the cache and the memory at the same time.

Very easy conceptually, but it has rubbish performance because writing back to memory takes so long. it also isn't very bright. What if we have a collection of writes to the same address in quick succession? We waste a lot of time writing values back only to change them again immediately.

write buffer We add a buffer to memory so we can kick off the read and then go back to work on the processor. This allows us to make changes and can queue up a collection of writes, depending on the buffer size.

The downside is that if the buffer is full, we have to stall the processor. but if we are going to produce writes more quickly than the memory can take them, there is not much we can do about that.

write-back This is the opposite of write through. We let memory and the cache get out of sync and then when the cache line needs to be evicted, we write any changes back to memory. The downside of this is that memory does get a little out of sync, and it makes cache misses more expensive.

Mechanical level

vocabulary

Skills

Last updated 05/12/2023