pete > courses > CS 202 Spring 24 > Lecture 21: control structures in C
Lecture 21: control structures in C
Goals
- implement a conditional statement in ARM32 assembly
- use shift to perform efficient multiplication
- implement while- and for-loops in ARM32 assembly
- define syntactic sugar
last time we saw a couple patterns
we saw a pattern in Linux commands and we saw a pattern in compilation
we saw that Linux commands take lists of things to operate on and also parameters that affect the operation being performed
and we saw that, when compiling from C to assembly, a typical pattern is "load operands from memory to registers, operate, save result from register back to memory"
(recall this is a consequence of RISC instructions only operating on register contents and not values in memory)
today we’re going to look at if-statements and loops
which are collectively referred to as control structures because they allow the programmer to control the flow of the program
that this requires "control" is now clear to us: because we know that, under normal circumstances, the program counter will proceed implacably from one instructon to the next
conditionals ("if" statements)
stupid-simple example program: if.c
one thing I didn’t point out last time: variable names don’t appear in the assembly
the compiler sets aside space in memory for each variable and that’s the only thing that makes it through to the assembly version
so don’t skimp on variable names!
nothing surprising about the C: if-statements look the same as in Java
you’ll note that I explicitly enclosed the body of the if-statement in curly braces
as in Java, these curly braces aren’t strictly necessary when the body is a single statement
HOWEVER
I personally think it’s a good habit to get into
because it’s easy to add another line of code that you intend to be in the same block (and you even indent it so it looks pretty)
but if, at that point, you don’t add the curly braces, you’re not going to end up with the code you think you wrote
this is, in fact, precisely what happened to Apple a couple years ago in the infamous "gotofail" bug in their SSL implementation
https://www.imperialviolet.org/2014/02/22/applebug.html
https://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2014-1266
this is NOT a dig at Apple or its engineers: this is a lesson in how consistency and discipline make you better programmers
if you get in the habit of always including curly braces, you will NEVER introduce bugs like this. full stop.
and the resultant assembly: if.s
looks pretty similar to what I showed before we were even considering C
we see the same "load operands, operate, store result" pattern
the only real surprise is this ".L2" thing, which appears both in the left-most column and as the target of the bge instruction
this is a label, used to make reading assembly easier
instead of making the reader calculate offsets manually, we use labels and leave the calculation up to the assembler
so we stick in a label showing where we want a successful branch to end up
and so we’ve discovered that the assembler doesn’t quite translate directly from assembly to machine code
it also translates these symbolic offsets (ie, non-literal) to the actual numbers (and therefore bits) that will appear in the machine code
another kind of unexpected thing
check out the instructions that calculate "a"
mov r3, r2 mov r3, r3, asl #3 add r3, r3, r2
it copies r2 to r3
then it takes r3, shifts it left by 3, and puts the result in r3
the effect of which is to multiply the contents of r3 by 8
then it adds r2 to r3
turns out these three instructions are more efficient than running it through the integer multiplication circuitry
while loop
stupid-simple example program: while.c
and the resultant assembly: while.s
no surprises here
again we see labels making our lives easier
for loop
how different is this to a while loop?
let’s ask Linux:
$ diff while.s for.s 14c14 < .file "while.c" --- > .file "for.c"
this tells us that the only difference in the resultant assembly is the name of the original source file (which happens to be on line 14 of both assembly files)
otherwise they are exactly identical
this makes sense!
a for-loop is just a while-loop with explicit initialization and iteration steps
there’s a term for programming constructs that aren’t fundamentally new but are easier to read: syntactic sugar
now, to really mess with you
what is the value of i after this code runs?
int i = 1; i += i++ + ++i;
"i++" is post-increment, which means the expression evaluates to i, and then i is incrememted
whereas "++i" is pre-increment, which means i is incremented and then the expression is evaluated
but here we’ve got an instance of each: which order do they happen in? this affects the result!
bad news: the C standard itself is ambiguous on this point
bottom line: don’t do it
programs should be written with clear intent and no surprises
there is NO benefit to taking shortcuts
especially in high-level languages! because it all gets compiled down to machine code anyway