Lecture 14 - Data Movement - Load and Store

Published

March 11, 2026

Goals

Learn some more about assembly operands
Learn the mechanics of immediate arguments
Learn how to load values from and store values in memory
Learn about four common addressing modes

Single operand instructions: MOV

The “move” instruction takes a value from one register and puts it in another. Of course, this is less of a “move” and more of a “copy”, but someone decided it was “move” decades ago, and we just have to live with it (the same thing goes for the mv command in unix to copy files on the command line)

MOV: Move (register)


 3   2                   1                   0
 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+---------------------------------------------------------------+
|       |0 0|0|1 1 0 1|S|       |  Rd   |               |  Rm   |
+---------------------------------------------------------------+

assembly:   AND Rd, Rn, Rm

effect:     Rd <- Rm

example:    e1a02003    MOV r2, r3

Two things to notice here:

this is just another data processing instruction with a different opcode
we lose Rn and use Rm as the single operand

Immediate operands

All of the instructions we have looked at up until now operate on values that are already in the registers. We are going to need some instructions that allow us to actually get some values in there…

One way is to use instructions that have “immediate” operands. These are instructions that include actual values instead of just the address of the location where we can find the value

MOV: Move (immediate)


 3   2                   1                   0
 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+---------------------------------------------------------------+
|       |0 0|1|1 1 0 1|S|       |  Rd   |        imm12          |
+---------------------------------------------------------------+

assembly:   MOV Rd, #imm12

effect:     Rd <- extend(#imm12)

example:    e3a0200a    MOV r2, #10

This instruction does exactly the same thing that the other MOV did, except that the value being “moved” is encoded right in the instruction

if you look at bits 21-27 we have only made a single change

the opcode has remained the same
bit 25 has flipped – this is what that bit indicates, is this an immediate (1) or a register (0) operation?

immediate value representation

point of order – our registers can hold 32-bit numbers, but our immediate value only has 12 bits!

what happens to the rest of the bits? can we represent larger numbers?

One approach we could take would be to sign extend the value – take the top bit and fill in the remaining 20 bits with it. As we talked about earlier, that works for our two’s compliment numbers both positive and negative numbers will retain their value if we sign extend them

ARM32, however, doesn’t do that… it does something far weirder and more complicated (because of course it is)

the 12-bit immediate value is actually broken down into two parts: a rotation and a value:


 1   0
 1 0 9 8 7 6 5 4 3 2 1 0
+-----------------------+
|  rot  |     value     |
+-----------------------+

the value is the actual bits that get stored in the register the rot tells us how to manipulate that value

zero-extend the 8-bit value to 32-bits
interpret the 4-bit rot as an unsigned integer
multiply rot by 2
rotate value by shifting the bits to the right by rot * 2; as bits are shifted off of the right, tack them back on to fill the hole on the left

example the first (in which “abcdefgh” represent arbitrary bits, a trick I stole directly from the official ARM32 documentation):


0001 abcdefgh

we first zero-extend the 8-bit value to 32 bits:


0000 0000 0000 0000 0000 0000 abcd efgh

then we interpret the rotation as an unsigned integer: 1 we multiply it by 2: 2 then we rotate by that many spots:


gh00 0000 0000 0000 0000 0000 00ab cdef

Other instructions

As we saw, to switch between register mode to immediate mode just involves flipping bit 25. We can do that to all of our data processing operations

There is one snag.

sub rd, rn, #imm12 will allow us to perform rd ← rn - extend(#imm12) But subtraction is not commutative – what if we wanted to subtract a register value from an immediate?

We have another instruction that performs “reverse” subtraction (rsb)

When I described the fetch-execute cycle, I introduced you to the idea of main memory – the place were (conceptually) all of the instructions and data for our currently running programs are stored

I described the process by which we were pulling instructions out of memory, now we need to talk about how to load and store data that our program can work with in and out of the memory

Fetch-execute cycle

A key architectural design choice was made many years ago, with the development of stored program computer The big idea here is that instructions look like data, so why not store them in the same place

So, we add to our picture of our process a couple of components - memory - a large, addressable place to store all of the instructions and data for the currently running programs - program counter (PC) - a special register that hold the address of the next instruction to be executed - instruction register (IR) - a register that holds the instruction currently being executed

different ISA may call these vaguely different things

Running a program is then a process we call the fetch-execute-cycle

the address stored in the PC is used to read out the next instruction into the IR
the data in the IR is used to perform the next operation (this can be as simple as a splitter to drive the select lines of the datapath elements)
while the instruction is being performed, the address stored in the PC is incremented
repeat

There are some variations in this cycle

Most architectures will take multiple clock cycles for this process since it takes time for data to be read and written to memory

Also, if we want non-linear programs (loops, conditionals, functions) then we need a way to provide an alternative input to the PC than the incrementor

But, this is a reasonable rough approximation of the process

Addresses

At this point, I’ve tossed around 32-bit machine, 64-bit machine a number of times, let’s get to what that typically means

ARM32 is a 32-bit machine, which means that the word size is 32 bits

There are a number of different definitions for “word”, but it generally means the native size of values in the machine. For our purposes, we will get more specific

word size - the number of bits the ISA uses to store a memory address

So, for a 32-bit machine, the maximum number of addresses is \(2^{32}\)

The second facet of the ISA is the addressability of the memory – how much data can be stored at each address

For both machines we will look at, they are byte addressable, so each byte in memory has an address (though we will still typically read and write 32 bit values)

Load

Here is where things get interesting for us

addresses are 32 bits long
instructions in ARM32 are fixed at 32-bits

So… if we want an instruction that will access a value in memory… it is a bit of a head scratcher.

We are going to go over four different techniques used in ARM32, which are collectively called “addressing modes”

Base Register


 3   2                   1                   0
 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+---------------------------------------------------------------+
|       |0 1|0|1 1|0 0 1|  Rn   |  Rt   |                       |
+---------------------------------------------------------------+


assembly:   ldr Rt, Rn

effect:     Rt <- mem[Rn]

example:    ldr r7, r9

This is the simplest solution. Registers hold 32 bits, so we set the register to the address we want and use that.

Base + immediate


 3   2                   1                   0
 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+---------------------------------------------------------------+
|       |0 1|0|1 1|0 0 1|  Rn   |  Rt   |         imm12         |
+---------------------------------------------------------------+


assembly:   ldr Rt, Rn, #imm12

effect:     Rt <- mem[Rn + zero-extend(imm12)]

example:    ldr r5, r6, #12

This instruction, as you might imagine, uses a combination of techniques. We have an address stored in a register, but then we add an offset in the form of an immediate value stored right in the instruction.

Note that the immediate value is is zero-extended before it is added to the register value (it has to have the same number of bits for the addition). It is not sign extended, nor does it make use of the extend and rotate we saw for other immediate values

Base + register offset


 3   2                   1                   0
 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+---------------------------------------------------------------+
|       |0 1|1|1 1|0 0 1|  Rn   |  Rt   |         |   |0|  Rm   |
+---------------------------------------------------------------+


assembly:   ldr Rt, Rn, Rm

effect:     Rt <- mem[Rn + Rm]

example:    ldr r9, r1, r0

This is the same theory, we just use a second register for the offset value instead of an immediate value

Base + scaled register


 3   2                   1                   0
 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+---------------------------------------------------------------+
|       |0 1|1|1 1|0 0 1|  Rn   |  Rt   |  imm5   |   |0|  Rm   |
+---------------------------------------------------------------+


assembly:   ldr Rt, Rn, Rm, shift

effect:     Rt <- mem[Rn + (Rm << imm5)]

example:    ldr r3, r2, r5, 2

This is the same as the above, where we add together a base address and an offset, both of which are stored in registers. The key difference here is that the offset is shifted by an immediate value included in the instruction.

Why would we want to be able to create addresses that had a base + some offset * a scaling factor?

What if instead of offset I called it index this starts to sound like an array

base is the address of the array itself
index is the array index
scale is the size of the data stored in the array

let’s talk about how this instruction breaks down

bits 26-27 are 01, recall how these were used earlier to determine the category
bits 20-24 will be 11001. There are variations in here depending on, for example, if we want to access an entire word, or just a byte (determined by by 22), but for our purposes this is our expected pattern

Notice that I didn’t say which instruction I was talking about They are basically the same! The only difference is bit 25

if bit 25 is 0, we have an immediate value
if it is a 1, then we don’t Yes, annoyingly, this is the opposite of how it works for data processing instructions

But I gave you four addressing modes, how can we only have two instructions?

base mode and base+immediate are the same if the immediate value is 0
base + register and base + scaled register are the same if the shift is 0

Store

Now we need a way to write data back into memory.

The instruction is STR

I would break it down like we did LDR, but all we need to do is flip bit 20 to a 0 Everything works exactly the same except data flows in the opposite direction

Dirty secret

I showed you four address modes – there are at least nine. These are controlled by bits 21-24, and we aren’t going to get into any of the other modes.

Mechanical level

vocabulary

word size

Skills