Lecture 35 - Caches II
Goals
- think about the mechanics needed for caches to work
- learn about direct mapped, associative and set-associative mapping for caches
Cache mechanics
You will have noticed that our assembly code just uses memory addresses, it never acknowledges the existence of caches. The cache mechanics is then left to the hardware, invisible to the program. So, we need to take a memory address, look in the cache to see if we have the associated value, and if not, we need to go get it. So, we have questions:
- if we have a memory address, how can we find it and tell if it is resident in the cache?
- If we have a cache miss and need to load a new block in, where does it go?
- if the cache is full, how do we choose which block has to leave?
- how do we keep the cache and memory in sync?
direct mapped cache
The direct mapped cache is one of the simplest schemes we can come up with. Every possible address is mapped to a location in the cache. So we can tell just by looking at the address which line would hold the block if it were there.
In addition to making it easy to find the right line, this also provides us with a simple placement policy. When we need a value at some address, the address tells us exactly which line we will be filling, and whatever is already there is ejected.
To do the mapping, we are going to divide the address into three pieces:
| tag | block address | block offset |
- block offset - this tells us where in the block we will find the byte referred to by this address
- block address - these bits will tell us which line of the cache we will find this block
- tag - the rest of the bits provide a unique label for this block. We will save this in the cache with the data so we can tell which block is resident
example: let’s say that we have four bit addresses and we have a cache with four cache lines
- with four cache lines, we need two bits for the block address
- we want to store more than a single byte in our block,so we need a bit each for the tag and the offset (implication? two byte blocks)
How much memory do we need for our cache? - two bytes per line for storage - one bit per line for the tags - one bit per line to indicate if the line is valid
So 18 x 4 bits = 79 bits
Why do we break the address up in that order? - the block offset has to come in the low bits because we are pulling blocks of memory that are contiguous - what about the block address and the tag?
If we put the block address in the left bits then addresses that are next to each other will map to the same cache line, which spatial locality suggests is not a good idea
Direct mapping is easy to think about and to implement, but it is subject to thrashing. We get thrashing when we have two memory locations that are within our working set that both map to the same cache line. The rest of the cache could be empty, but we have to keep switching out the same cache line. This is very bad because this performance would be far worse than if we didn’t have the cache at all.
Associative mapping
Associative mapping takes the opposite approach to direct mapping. A block can be put into any line in the cache.
Breaking up the address is now a little easier
| tag | block offset |
Of course this creates some new issues - to find the address, we need to do a comparison to the stored tags across all of the lines of the cache. This isn’t a big deal for our toy four line cache, but as they get up to more reasonable sizes, this is a large chunk of logic - we also will need to come up with placement and replacement policies to figure out where to add a new block, and which one to evict
Set associative mapping
Set associative mapping is a compromise between the two approaches
The cache is divided up into sets, each of which uses associative mapping.
We will divide up the address in a way very similar to the direct mapped approach
| tag | set | block offset |
The difference is that the set doesn’t tell us the exact line, it just gets us close (into the set). From there, we will look at the tags to try to figure out if our line is present.
example: return to our 4 bit address and four line cache. This time, we divide the cache up into two sets
- we only need 1 bit to identify the set
- we will still use 1 bit for the block offset
- now the tag is two bits
- to find the address, we go the the right set, and then we do a comparison across all of the lines in the set to see if our block is in residence
- if it isn’t, we can load it into either of the lines (we will need a replacement policy to figure out which one if they are both filled)
The set associate cache is actually the most general form - if the number of lines per set is 1, then we have direct mapping - if the number of sets is 1, then we have associative mapping
The typical approaches are 2,4,8, and sometimes 16-way mapping. This refers to the number of lines per set. 8 is generally the most common size these days.
Mechanical level
vocabulary
Skills