pete > courses > CS 202 Spring 24 > Lecture 13: the ADD instruction


Lecture 13: the ADD instruction

Goals


in lecture 11, we came up with the idea of a register file and ALU to Do Computation

we can select two input registers by putting the right sequences of bits on the pair of "source register" inputs to the register file; we can select the operation to perform by putting the right sequence of bits on the "op" input of the ALU; we can select where to store the result of the operation by putting the right sequence of bits on the "destination register" input of the register file; then we toggle the clock and the result is stored in that register

this was all controlled by those various sequences of bits

in lecture 12, we gathered together those separate sequences of bits, each of which specifies part of the functionality, to form an instruction (often called machine code because it’s a line of code that is run by the machine)

but a consequence of the instruction having meaning to the machine is that it can be difficult for humans to derive meaning from that sequence of bits

therefore, we discussed the idea of assembly language, which is a human-readable translation of machine code, in which the various aspects are translated separately, giving us something like "ADD r2, r4, r5" which is the assembly language version of the machine code instruction that adds together the contents of the fourth and fifth registers and stores the result in the second register

over the two lectures, we also discussed the idea that there are many different decisions that need to be made when designing a computer processor, the result of those decisions defines its functionality, and is called an instruction set architecture

as I said, there are two different philosophical camps when it comes to ISAs: RISC and CISC

in modern computing, RISC is most popularly embodied by the ARM ISA, which chips are frequently used in power-conscious applications like mobile computing (phones, watches, Apple laptops)

whereas CISC is most popularly embodied by the x86 ISA, which chips are most frequently found in desktop machines, servers, and non-Apple laptops

I gave you a list of traditional ways that RISC and CISC processors differ, but little in the way of specifics

today we change that by looking at our first real instructions


we are going to look at both ARM and x86 in this course, and it’s going to take some work to keep them straight

to help you there, I want to be very very clear about what I expect out of you regarding each instruction set

for ARM, I expect you to be able… - to convert assembly language to machine code ("assemble") - to convert machine code to assembly language ("disassemble") - to explain the effect of assembly instructions on the processor state

for x86, I expect you to to be able… - to explain the effect of assembly instructions on the processor state

we will see x86 machine code, if only to convince you that it really is ugly and unwieldy, but I am not going to require you to work with it

we’re going to start with ARM32 because it’s way easier to work with

for every instruction, I will give you a similarly-formatted reference that describes its form and function

you will be allowed to use these references when taking quizzes and exams


our first instruction is the ADD instruction

and here is the reference:

 3   2                   1                   0
 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+---------------------------------------------------------------+
|       |0 0 0 0 1|     |  Rn   |  Rd   |               |  Rm   |
+---------------------------------------------------------------+

assembly:   ADD Rd, Rn, Rm

effect:     Rd <- Rn + Rm

example:    e0823003    ADD r3, r2, r3

the diagram at the top shows the meaning of each bit of the machine-code instructions

it is a 32-bit instruction and the numerals in the top two rows identify the place of each bit

the left-most (most significant) bit is bit 31, the right-most (least significant) bit is bit 0

for this instructions, bits 0-3 are interpreted as a single value with the name Rm

bits 12-15 are interpreted as a single value with the name Rd

bits 16-19 are interpreted as a single value with the name Rn

and bits 23-27 must have the value 0 0 0 0 1


Rn, Rm, and Rd are register specifiers: that is, they are sequences of bits that identify a particular register

since they are each 4-bit values, we could conclude that this processor has 16 registers (and we would be correct!)

this exemplifies one of the characteristics of RISC architectures: lots of general purpose registers

because it sure seems like any of the 16 registers could be used as source or destination for this instruction (and, again, we would be correct!)


that 0 0 0 0 1 sequence in bits 23-27 is how we know this sequence of bits is an ADD instruction

other instructions that perform other operations will have a different value in those five bits

this is sometimes called the opcode

(it’s a bit more complicated than that, but this understanding will do for now)


before we try to understand what this instruction does, let us first walk through how to convert between the machine-code representation and the assembly-language representation

let us imagine I give you this value:

e0823003

I tell you it is a machine-code instruction (because otherwise there is no way to know what kind of data it represents) and ask you to disassemble it—that is, convert it to assembly language

the first step is to convert it to bits:

1110 0000 1000 0010 0011 0000 0000 0011

then we look at bits 23-27, which have the value 0 0 0 0 1, and so we know it’s an ADD instruction because the ADD instruction needs that exact sequence in those places

we then refer to the reference for the ADD instruction to decipher the rest

it tells us that bits 0-3 is Rm, so we take those four bits and interpret them as an unsigned binary number: in this case, 0 0 1 1 means that Rm is 3

likewise for Rd, in bits 12-15, has value 2

and Rn, in bits 16-19, has value 3

when writing register specifiers in ARM32 assembly, we prefix the number with the letter 'r' to make it clear we’re talking about a register:

thus, that machine code instruction translates to this assembly instruction:

ADD r3, r2, r3

(to perform assembly, we reverse the process)


now, its effect: Rd <- Rn + Rm

when we see registers referred to in this context, we mean "the contents of"

so we read this as "take the contents of register Rn and the contents of register Rm, add them together, and store the result in register Rd"

for the instruction we just disassembled, Rd is 3, Rn is 2, and Rm is 3

so its effect is "add together the contents of registers 2 and 3 and put the result in register 3"

(this may seem like a nearly nonsensical operation, but it pops up surprisingly often in real code)


so if my register file has these contents:

R0:   0
R1:  23
R2:  97
R3:   8
...

and I run that instruction….

this is what the register file will look like afterwards:

R0:   0
R1:  23
R2:  97
R3: 105
...

(registers 4-15 elided for brevity)


before we look at the equivalent instruction in x86, there is some bad news

there are two different dialects of x86 assembly: Intel and AT&T

the good news is that pretty much all programs that operate on x86 assembly can work with either/both

in this class, we will ONLY work with the Intel format (because it’s more consistent with patterns in ARM32 assembly) but you should be aware that there is another format in case you run across stuff that claims to be x86 assembly but looks very strange to you


now, how we do addition in x86

01 d0       add    eax,edx

effect:     eax <- eax + edx

I’ve written it in a different format: the machine code is on the left, the corresponding assembly language is on the right

(recall that I do not expect you to be able to translate between the two)

there are a few things to note here

first, the instruction is super short! only 16 bits: 2 bytes

this is evidence in support of the claim that CISC ISAs have instructions of variable length

(over the next few lectures, we will see more evidence in support of this, as well as evidence in support of the claim that RISR ISAs like ARM32 have instructions of uniform length)

second, the instruction only specifies two… things, which must be the… operands?

and yes, those things (eax and edx) are registers, which is the third thing (register names in ARM32 are much nicer)


we interpret the effect in a similar way as with ARM32: add together the contents of registers eax and edx, store the result in eax

in this case, the destination register is also the first operand: so since eax is the first operand, that’s also where the result will be stored

this is evidence of the claim that CISC ISAs like x86 often use special-purpose registers

(this is a consequence of design decisions made literally 45 years ago, which we are still stuck with today—FUN!)


recall that I do expect you to be able to describe the result of an x86 instruction

so if my register file looks like this:

eax:   7
ebx:  82
ecx:  16
edx:  73

and I execute this instruction…

here is what the register file will look like afterwards:

eax:  80
ebx:  82
ecx:  16
edx:  73

(we will see more of these registers soon)


to bring this idea back around to finite state machines and the traffic light…

imagine that the state is the entire contents of the register file

and the instruction is the transition that takes us from one state to another


this is great!

we can translate between machine code and assembly

and we can simulate the behavior of the ADD instruction in both ARM32 and x86


but how do values get into registers in the first place?

when our computer turns on, all the registers could have the value zero

and adding zero and zero doesn’t let us do much of anything interesting

we’ll solve that problem soon


Mechanical Skills

The following mechanical skills introduced in this lecture are fair-game for future quizzes. You may access practice questions (which will exactly resemble the questions on the quizzes) on weathertop.

t1p2m01    assemble/disassemble data-processing instructions (register)

Last modified: