pete > courses > CS 202 Spring 24 > Lecture 18: control structures
Lecture 18: control structures
Goals
- define control structure
- describe the purpose and behavior of the branch instruction
- describe the purpose and behavior of the compare instruction
in the previous lecture, we resolved two important issues
firstly, instructions are stored in memory
in a Von Neumann-style computer (of which most computers we directly interact with are examples) instructions and data both reside in the same memory
the other—Harvard-style architecture—has separate memories for data and instructions
we discussed trade-offs and I said that real computer processors these days often incorporate elements of both
then we talked about how to achieve sequential execution of a program written in assembly language
since instructions are stored in memory, I presented the idea of a register that contains the address of the currently-executing instruction
I called this the program counter (PC) or instruction pointer (IP)
we concluded that, to cause sequential execution, we just needed to increment the PC by 4 after an instruction finishes executing
(four because each instruction is 4 bytes long and we’re working with byte-addressable memory)
let’s revisit the example sequence of instructions from Friday, but I’m going to give them memory addresses (the left-most column)
96: mov r0, #180 100: mov r1, #42 104: add r2, r1, r0 108: xor r0, r0, r0 112: str r2, [r0]
additionally, beyond the three registers explicitly mentioned in the instructions above, we’ll start keeping track of the program counter (PC)
it’ll have an initial value of 96, indicating that the instruction at that address (the first mov) is the instruction about to be executed
therefore, the sequence of events from Friday plays out like so:
- read the program counter (96) and fetch the instruction at that address (mov r0, #180)
- execute that instruction: r0 gets the value 180
- increment the program counter by 4: PC now has the value 100
- read the program counter (100) and fetch the instruction at that address (mov r1, #42)
- execute that instruction: r1 gets the value 42
- increment the program counter by 4: PC now has the value 104
- read the program counter (104) and fetch the instruction at that address (add r2, r1, r0)
- execute that instruction: r2 gets 222
- increment the program counter by 4: PC now has the value 108
- read the program counter (108) and fetch the instruction at that address (xor r0, r0, r0)
- execute that instruction: r0 gets 0
- increment the program counter by 4: PC now has the value 112
- read the program counter (112) and fetch the instruction at that address (str r2, [r0])
- execute that instruction: store the value 222 in main memory at address 0
- increment the program counter by 4: PC now has the value 116
and now there are no more instructions, so we can imagine the program is done (this is an oversimplification, but will do for the purposes of this course)
this is great, but there are a huge number of programs we can’t write
we need some way to implement the equivalent of conditionals, loops, and functions—collectively referred to as "control structures"
because they alter the "flow of control"
by which I mean "which instruction executes next"
by default, the flow of control is such that instructions execute sequentially, by incrementing the program counter by 4
but conditionals, loops, and functions all require that the program counter behave differently
concretely, what we need is some way to counteract this inexorable march of PC <- PC + 4
that is, sometimes we want the PC to advance to the next instruction and sometimes we want it to take on a completely different value
let’s consider the simplest control structure you’ve seen: the "if" statement
here’s an example written in Java (though it also happens to be valid C):
if(x > y) { x = x - y; } y = x;
in general, it says: if some condition is true, execute some block of code
in assembly language terms, that "block of code" is just a bunch of instructions
the condition is a bit less straightforward
let’s make it stupid-simple and imagine the condition is a register: if this register contains zero, the condition is false; if the register contains non-zero, the condition is true
so we want something to go right before the block of assembly instructions
if the condition is true, what do we want? we want the PC to be incremented by 4 as normal
but if the condition is false, we want to set the PC such that the instruction following the block is executed next
this is potentially counter-intuitive
when the condition is false, we want something special to happen
only when the condition is true do we want the normal thing (ie, PC <- PC + 4) to happen
unsurprisingly, there is an instruction to do just this
that instruction is called branch and it is given the mnemonic "b"
here is what it looks like
3 2 1 0 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 +---------------------------------------------------------------+ | cond |1 0 1 0| offset | +---------------------------------------------------------------+
first thing to note: 4-digit opcode!
second thing to note: we’re actually using the first four bits of the instruction!
third thing to note: that’s a biiiiiig offset!
the way it works is that the "cond" field specifies a particular condition
if that condition evaluates to false, PC <- PC + 4 as normal
if it evaluates to true, PC <- PC + 4 + sign-extend(offset) * 4
note that, because the offset is sign extended, we can both add to the PC and subtract from it
adding jumps to larger addresses (later instructions) and subtracting jumps to smaller addresses (earlier instructions—this is useful for loops!)
here are the values for the "cond" field
0 0 0 0 equal 0 0 0 1 not-equal 1 0 1 0 greater-than-or-equal 1 0 1 1 less-than 1 1 0 0 greater-than 1 1 0 1 less-than-or-equal 1 1 1 0 always
but how does this even work?
"equal" is all well and good, but we need two things to compare to see if they’re equal!
there’s no room in the branch instruction to specify these things!
enter the CMP instruction
unsurprisingly, there’s both an immediate version and a register version
here’s the immediate version
3 2 1 0 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 +---------------------------------------------------------------+ | |0 0 1 1 0| | Rn | | imm12 | +---------------------------------------------------------------+
so "cmp r2, #42" will compare the contents of register r2 with the sign-extended immediate value imm12 and set the condition code registers accordingly
the register version looks predictably similar
3 2 1 0 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 +---------------------------------------------------------------+ | |0 0 1 1 0| | Rn | | Rm | +---------------------------------------------------------------+
so "cmp r2, r1" will compare the contents of register 2 with the contents of register r1 and set the condition code registers accordingly
(there are versions with the shift as we saw in the load/store instructions, but those are an unnecessary complication right now)
revisiting the example Java/C code previously, we could imagine the following sequence of assembly instructions:
(assume that r0 holds the value of x and r1 holds the value of y)
if(x > y) { cmp r0, r1 ble #1 x = x - y; sub r0, r0, r1 } y = x; mov r1, r0
note that the if statement is actually represented by two instructions, whereas the assignments are each represented by a single instruction
why branch "always" ?
consider this modification to the previous Python code:
if(x > y) { x = x - y; } else { x = x + y; } y = x;
suppose the "if" condition evaluates to true, in which case the body is executed ("x = x - y")
what, then, do we want to happen when we finish executing the assembly instructions that implement "x = x - y" ?
we need to jump over the instructions that implement the "else" body ("x = x + y")
thus, if we reach that point, we must always branch
once again, the Java/C code from above could be written using assembly like the following:
if(x > y) { cmp r0, r1 ble #2 x = x - y; sub r0, r0, r1 b #1 } else { x = x + y; add r0, r0, r1 } y = x; mov r1, r0
note that the end of the "if" block is marked by an unconditional jump over the "else" block
also note that the offset to the ble instruction is now 2 because we need to jump over both the sub and the b instructions to get to the "else" case
now the question is: our condition code registers are N, Z, and P
but the branch conditions are things like "equal" and "greater-than" and so on
how do we implement the latter with the former?
that is indeed a good question, which you will be figuring out in a future assignment
Definitions
The following definitions introduced in this lecture are fair-game for future quizzes. You will be expected to give the exact definition as provided in these lecture notes.
- control structure