pete > courses > CS 202 Spring 24 > Lecture 17: simple assembly program


Lecture 17: simple assembly program

Goals


loads and stores in x86

c7 45 cc 00 00 00 00    mov    DWORD PTR [rbp-0x34],0x0

Meaning:    Mem[rbp-0x34] <- 0
8b 45 cc                mov    eax,DWORD PTR [rbp-0x34]

Meaning:    eax <- Mem[rbp-0x34]
89 54 85 d0             mov    DWORD PTR [rbp+rax*4-0x30],edx

Meaning:    Mem[rbp+rax*4-0x30] <- eax

so we’ve got all these instructions now

we can perform operations like addition and subtraction on values in registers

and when the registers don’t give us enough working space, we can resort to main memory by using the load and store instructions

keeping in mind that we also know all the abstractions and hardware that actually make these things work

now we can start to write programs!

that is, sequences of instructions that solve larger problems


so what does a program written in assembly language look like?

mov r0, #180
mov r1, #42
add r2, r0, r1

eor r0, r0, r0
str r2, [r0]

here we’re adding two numbers together and storing their result at memory location 0

(no, we haven’t officially seen the eor instruction yet, but it has predictable effects—it’s also a common method to set a register to zero)


one thing you need to know how to do is interpret the behavior of an assembly program like the one above

by which I mean understand the individual steps it performs and the higher purpose it fulfills (the latter is abstraction!)


to the first question, this program is similar to Python or Java programs you’ve written in that it proceeds from one instruction to the next in sequence, performing the operation indicated at each step

one way to deduce a program’s behavior is to keep track of how its state changes throughout execution

therefore, I start with a diagram of a register file whose contents we do not yet know and then evaluate the instructions in turn

(I simulate this lack of knowledge by leaving entries in the register file blank—registers cannot be floating, so they will have a value, we just can’t know it, and therefore shouldn’t assume anything about it)


the first instruction is "mov r0, #180"

this is the "immediate" variant of the MOV instruction

meaning that it takes the value 180 and puts it into register r0

therefore after the first instruction executes, r0 contains 180 and the rest of the registers are still unknown

by the same token, after the second instruction executes, r1 contains 42

and after the third instruction executes, r2 contains 222, which is the sum of the contents of r0 and r1


a brief digression back to finite state machines

the contents of the registers at any given instant is the state

an instruction causes the transition from one state (ie, particular contents of registers) to another state (ie, a different set of contents in registers)


the fourth instruction is new and different in a couple ways: eor r0, r0, r0

we haven’t seen the eor instruction before, nor have we seen an instruction whose source and destination registers are all the same

furthermore, the value in r0 is 32 bits, and we’ve only seen what xor does in the context of single-bit values, so how does it extrapolate to bigger inputs and output?

the answer is that this is bitwise xor, meaning that the zeroth bit of the output is the result of xor’ing the zeroth bit of the first operand with the zeroth bit of the second operand

likewise for the other 31 bits

it just xor’s each corresponding pair of bits

so if I was performing bitwise xor on 10101 and 11011, I would get:

    10101
xor 11011
---------
    01110

because I apply the single-bit xor operation by columns


what, therefore, is the effect of "eor r0, r0, r0" ?

since the two operands are the same, every pair of bits xor’ed together will be the same

and the result of xor’ing a pair of identical bits is always zero

therefore the result of this operation will always be zero

this is a popular way of getting the value zero into a register

(there are other, perhaps more intuitive ways, but this one is sufficiently popular that you need to be able to recognize it)

so the upshot is that r0 now contains zero, r1 contains 42, and r2 contains 22


finally: "str r2, [r0]"

this says "take the contents of register r0, use those contents as a memory address, and store the contents of r2 at that address"

given the current state of the register file, this means "store 222 at address 0"

because r2 contains 222 and r0 contains 0

thus the ultimate effect of the program is to store 222 at address 0


but even this simple program raises some questions: where are the instructions themselves stored and how do we achieve the "execute them one by one" effect?


first: where are these instructions stored?

(because this will lead to answering the other question)

they don’t just appear out of the ether, to be executed by the hardware you’re building for this week’s assignment

so where do they live?

where might they live?

well we’ve already got all this memory stuff to store bits, what if instructions live there, too?

good idea, let’s do that


now another question: we’re already going to use memory to store data (ie, values we’re operating on), do we want to intermingle this data with instructions?

that is, do we use the same 32-bit address space to simultaneously store instructions and data?

there are two possibilities: "yes" and "no"

over the decades, different computers have picked different answers

and the relative approaches, unsurprisingly, have been given names

a "Harvard architecture" is one in which data and instructions are segregated, often physically

in contrast, a "Von Neumann architecture" has data and instructions inhabit the same address space


this means that in a Von Neumann architecture, you perform loads and stores against a single pool of memory

the bits you read might be an instruction or they might be data

(and recall that bits have no type! there is no way to look at bits in memory and know it is or is not an instruction, just like you can’t know whether bits represent an integer or a floating point number or a string of ASCII)

in a Harvard architecture, you have two separate pools: one exclusively for instructions and the other exclusively for data


there are tradeoffs

in a Harvard architecture, you need twice the hardware to perform memory operations

but in the Von Neumann architecture, one could imagine writing a program that overwrites its own instructions

(perhaps surprisingly, this may be simultaneously a bug and a feature)

there are other considerations that we’ll discover as we explore computer systems over the next few weeks

for now, though, know that most general-purpose computers are Von Neumann-ish

and many special-purpose computers are Harvard-ish

I say "ish" because, like most things in this course, there are exceptions

and, in fact, the organization of modern Intel processors (which I use as the standard) are inspired by both—we’ll see this near the end of the semester


okay, on to the second question: how do we achieve the "execute one instruction after another" effect?

but first, given that instructions are stored in memory, how do we get the "execution just one instruction" effect?

first we have to grab the instruction from memory

when we get around to discussing the phases of instruction execution, we’ll call this step "instruction fetch"

it means we need to have the address of the instruction to execute

and in fact we save this address in a register


called the "program counter"

a register that stores the address of the instruction about to execute

sometimes also called the "insruction pointer" because it points to the instruction we want to fetch and execute (ie, indicates its location)

abbreviated PC or IP

interestingly, in ARM, the PC is just another register

you can actually use it as an operand to many instructions: it’s r15


given we’ve got this register that stores the address of the instruction to execute, what does the process of executing an instruction look like?

well, you’re building it (though you’re not dealing with the PC for this assignment: that’s for hw6)

going back to the code I showed at the beginning of class today, what should happen to the PC after every instruction executes?

that is, what needs to happen so that the PC contains the correct address?

increment it by the size of an instruction (4 bytes)

side-note: here the uniform instruction length of RISC architectures makes life easier: we always increment the PC by the same amount—this is not the case for CISC machines


we’ve now answered the questions we posed ourselves: where are instructions stored and how do we get the sequential execution of instructions to happen?

instructions are stored in memory, sometimes in the same memory pool as data (Von Neumann) and sometimes not (Harvard)

and we keep the address of the next instruction in a register, which we call the Program Counter, so that we can fetch it and execute it

we’re that much closer to being able to write real programs


but real programs feature some operations that the instructions we’ve looked at just can’t do

loops, conditionals, function calls

collectively referred to as "control structures"

because they control which instructions get executed

Last modified: