pete > courses > CS 202 Spring 24 > Lecture 19: assembly programs and toolchain
Lecture 19: assembly programs and toolchain
Goals
- perform disassembly
- interpret higher-level behavior of an assembly program
- identify components of the toolchain and briefly describe what they do:
- compiler, assembler, disassembler, decompiler, debugger, linker
given the ARM reference I wrote, do you think you could, given time, write an assembler?
do you think you could write a disassembler?
(I won’t ask you to do either of these)
sure, it’s a straightforward, totally mechanizable process
and you’re practicing it on the assignments, so I’m not going to bother you with it anymore
let’s look at some very small programs written in ARM32 assembly
Program The First
e3a0200c mov r2, #12 e3530000 cmp r3, #0 aa000001 bge #1 e2633000 rsb r3, r3, #0 e0822003 add r2, r2, r3
let’s take it instruction by instruction
and keep a record of what the registers contain
after "mov r2, #12", register r2 contains 12 (decimal)
after "cmp r3, #0", the CC regs reflect the comparison between the contents of r3 and 0
now the tricky part: the branch, which does different things based on what’s in the CC regs
if the CC regs say "greater than or equal to" (ie, the contents of r3 >= 0), we skip one instruction
(one way to think of positive offsets to branch instructions is the number of instructions to skip)
so if r3 >= 0, we skip the rsb instruction
(yes, that’s a new instruction, I’ll get to it in a second)
therefore, only when r3 < 0 do we execute the rsb instruction
rsb is pretty simple: it’s a sub with reversed operands
so "rsb r3, r3, #0" means "r3 <- 0 - r3"
and finally "add r2, r2, r3" adds the contents of r2 and r3 and puts them in r2
we’ve gotten our hands really dirty here
and sometimes when you do that, it’s tough to see the bigger picture
so let’s step back and re-write this a a bit at a higher level of abstraction (ie, worrying more about effects rather than method)
put #12 into r2
compare r3 with 0
if the comparison shows that r3 < 0, we negate r3
then we add r2 and r3
in a nutshell, we’ve performed this operation: r2 <- 12 + absolute-value(r3)
being able to see some chunk of code, deduce its function, and describe it concisely at a higher level of abstraction is important
so we’re going to practice it
here’s a chunk of code
you’ve got 15 minutes, you are encouraged to talk amongst yourselves
e3a0000a mov r0, #10 e3a01000 mov r1, #0 e3a02005 mov r2, #5 ea000002 b #2 e0811000 add r1, r1, r0 e2422001 sub r2, r2, #1 e3520000 cmp r2, #0 cafffffc bg #-4
the tricky part is the branch instruction: remember that it works relative to the PC + 4 (ie, the address of the subsequent instruction)
answer: calculates product of 5 and 10, puts result in r1
and it uses a loop to do it!
if we were to write Java- or C-like code, it might look like this:
r0 = 10; r1 = 0; r2 = 5; while(r2 > 0) { r1 = r1 + r0; r2 = r2 - 1; }
how did I get all this assembly language and machine code?
certainly I didn’t compose it myself: I’m far too lazy for that
which means it’s time to review the chain of tools that get us from source code to machine code
these programs are collectively referred to as the "toolchain"
no, this is not a coincidence
in reviewing the toolchain, I’m going to start tying these vague vocabulary words to actual programs we will use for these purposes over the rest of the semester
compiler translates source code to assembly (we’ll be using gcc)
assembler translates assembly to machine code (gcc for this, too)
disassembler translates machine code to assembler (objdump)
you’ll note that these are inextricably tied to a particular ISA
recall, the ISA is (among other things) the set of instructions supported by a given CPU
since the set of instructions is particular to an ISA, the assembly language itself is going to be particular to an ISA
and therefore the compiler, assembler, and disassembler are also going to be particular to an ISA
there are a few other programs that fall under the broad heading of toolchain
one is the program that lets you step through your code step by step, examining the state of the machine at each point
this is called a debugger and the one we’ll use is gdb
we may valgrind later on when we see the heap
I’m hoping to show you what the linker does, but not for a few weeks
let’s talk C, because that’s what’s next
you guys are comfortable with the concepts of, eg, variables, conditionals, loops, classes, and functions in Java and/or Python
my goal is to simultaneously teach you C and show how higher-level programming languages translate to assembly code
so I’m going to write up very simple C programs that demonstrate these concepts
and show the assembly that is produced when we feed them to a compiler
then we get to writing larger programs in C, it’s on you to take the high-level notions you already have of variables, conditionals, loops, etc and translate those ideas to C