CS 202 - Notes 2018-12-03

Memory Hierarchy

We have a collection of different memory technologies with different characteristics with respect to cost (both in terms of dollars and real estate) and performance. There is no perfect memory type, so we will use a combination of them to build a memory hierarchy. For memory elements near the CPU, we will prioritize speed, so, as we move away from the CPU, memory will get slower, larger and less expensive.

Most of the hierarchy is invisible to its members. Each memory unit will receive requests from the element above it on the hierarchy, if it can't fulfill the request directly, it asks the memory unit below it on the hierarchy. When the unit below returns the requested data, it is passed up to the requester above, which will never know if the request could be completed by what the unit had stored locally, or if the request had to be passed down the chain. This provides the CPU with the illusion of an unlimited memory (though access time will vary).

Working set

The reason this works is because of a concept called locality of reference -- memory references tend to cluster. We have two types of locality:

  • temporal locality - When we reference something, we will likely reference it again soon. For data, this is pretty obvious, we tend to use the same few variables repeatedly for a while. For instructions, loops and recursive functions are an example of this. Depending on the time scale, non-recursive functions care candidates for reuse as well.

  • spatial locality - When we reference a location in memory, we will likely reference something near it soon. Instructions are an obvious example of this. Any instruction that isn't a branch or function call will be followed by the instruction immediately after it in memory. For data, arrays are a clear example, since we typically traverse the whole array. However, even local variables are a candidate since we know the compiler places them in neighboring locations in memory.

These patterns of related memory accesses can be considered working sets. If we can get the working set into fast memory, computations will be faster because we don't have to wait for slow memory accesses. When the program moves to something new, we can move in a whole new working set from the base of the hierarchy, occasionally moving stale working sets back down into slower memory when they aren't being used.

Caches

We call the intermediary memory units caches. On many modern machines (we will consider our lab machines to be sufficiently modern), we have three levels of cache between the CPU and main memory: L1, L2, and L3. L1 and L2 are associated with individual cores, while L3 is typically shared by all cores, but still on the chip. Caches can be either unified or split between data and instruction. In the lab, the L1 caches are split, while the others are unified.

The hit rate on a cache is the percentage of the time the cache can satisfy a request directly. This is dependant on both the characteristics of the cache and the pattern of memory accesses. If requested data is not found, we call this a miss.

Example:

L1 cache: 1000 words and access time of .01µs

L2 cache: 1000,000 words and access time of .1µs

If we have a 90% hit rate on L1, then the overall access rate is

access_rate = .01 * .9 + (.01 + .1) * .1 = .009 + .11 * .1 = .009 + .011 = .02µs

So, the rate is twice as slow as a 100% hit rate on L1, but five times faster than if we just used L2.

Note that when we miss, the miss penalty is the time to access L1 AND the time to access L2 because we have to query L1 first and find that it doesn't have what we are looking for.

Last Updated: 12/4/2018, 4:30:41 PM