Lecture 33 - Memory Hierarchy

Published

May 4, 2026

Goals

learn the basics of the memory hierarchy

The last stop on our tour of the innards of the computer is memory. We have already had some experience with a couple of different kinds of memory – notably the registers and main memory. There are actually many kinds of memory in the computer and we are going to try to get a bit of a sense of what they are and how they work together

Memory architecture

We have been working with a fairly amorphous understanding of what we mean when we say “memory”. Things we know: - it is a large place where we can store instructions and data - it has something to do with RAM - both the heap and the stack live in memory

A Tale of Two Architectures

When I have drawn pictures of the heap and the stack, I did not include the instructions. When we were thinking about instructions in memory, we did not talk about where the data was living. And yet we consider both of them to be “in memory”

Before we clear that up, I want to take a moment to step back in time. On the first computers, we programed them by, in essence, reconfiguring the hardware with plug boards.

This was followed by the introduction of punch cards (which had already been in use for decades to drive looms). The problem with executing punch card instructions immediately is that we couldn’t do things like conditional execution, so all we could make is calculators.

In the late 40’s, we moved to something called the stored program model. The idea being that instructions would live in memory and the processor could have random access to them. Punch cards then moved to a way to load the program.

There are two competing approaches to handling instructions in the memory: the von Neumann architecture and the Harvard architecture.

The von Neumann architecture has a single large pool of memory and the data and the instructions share space. This leads to something called the “von Neumann bottleneck”. The CPU is much faster than the memory and if we have the same bus for the instructions and data, then we end up having to wait a lot.

The Harvard architecture splits the memory into instructions and data, which separate data buses for the two. This means that while we are carrying out data instructions, we can be fetching the next instruction simultaneously. The downside is that we can’t use the instruction memory for data if we have small programs that work on a lot of data.

Current systems use a hybrid approach that has a single unified memory but then have separate pathways through a memory hierarchy that allows instructions to travel simultaneously.

Memory hierarchy

We have been thinking about just two types of memory – the registers and main memory. We have somewhat idealized main memory. We think about it as being unlimited and fast (and cheap would be good as well) – it is none of these things.

In general we can have fast, small and expensive, or large, slow, and cheap.

To create a system with a balance of these things, we build a hierarchy of different kinds of memory, with the fast, small and expensive memory near the processor and the large, slow, cheap memory several layers away.

Memory characteristics

There are a number of characteristics we can talk about with respect to memory:

location - where is the memory
- inside the computer
  - inside the CPU
    - registers and some cache
  - main memory and some cache
- outside the computer
  - drives, tapes, cloud
capacity - how much can be stored
unit of transfer - how much can be moved at once
access method
- sequential access - everything must be read in order
  - access time will vary based on what we are looking for and where it is
- direct access - we can jump to regions, but then we need to do a small sequential search
- random access - we can jump immediately to what we want (constant time lookups)
- associative - random access based on a key
performance
- access time - elapsed time from start of operation to read or write completion
- memory cycle time - we may have to wait for memory to recover between operations
- transfer rate - once found, how quickly can we move the data in or out
physical type
- semiconductor, magnetic, optical, holographic, etc…
persistence
- volatile - needs power to retain values
- erasable - can be erased (opposite is read only)
- decaying - data needs to be refreshed or it will become corrupted
- long term - non-volatile, non-decaying

basic technologies

SRAM (static random access memory)
- used for caches
- this is, in essence, the flip-flop circuits I showed you
- low-density, so they take up a proportionally large amount of space making them expensive
DRAM (dynamic random access memory)
- main memory (what ad copy is referring to when it says “8 Gigs of RAM”)
- dynamic because it needs to be refreshed every 10-100ms (each bit is stored in a capacitor)
Flash
- in thumb drives and SSD
- slower than DRAM
magnetic disk
- hard drives
- very large, very cheap, very slow
magnetic tape
- tape drives
- also cheap, large and slow

Mechanical level

vocabulary

Skills