pete > courses > CS 202 Spring 24 > Lecture 21: control structures in C


Lecture 21: control structures in C

Goals


last time we saw a couple patterns

we saw a pattern in Linux commands and we saw a pattern in compilation

we saw that Linux commands take lists of things to operate on and also parameters that affect the operation being performed

and we saw that, when compiling from C to assembly, a typical pattern is "load operands from memory to registers, operate, save result from register back to memory"

(recall this is a consequence of RISC instructions only operating on register contents and not values in memory)

today we’re going to look at if-statements and loops

which are collectively referred to as control structures because they allow the programmer to control the flow of the program

that this requires "control" is now clear to us: because we know that, under normal circumstances, the program counter will proceed implacably from one instructon to the next


conditionals ("if" statements)

stupid-simple example program: if.c

one thing I didn’t point out last time: variable names don’t appear in the assembly

the compiler sets aside space in memory for each variable and that’s the only thing that makes it through to the assembly version

so don’t skimp on variable names!

nothing surprising about the C: if-statements look the same as in Java


you’ll note that I explicitly enclosed the body of the if-statement in curly braces

as in Java, these curly braces aren’t strictly necessary when the body is a single statement

HOWEVER

I personally think it’s a good habit to get into

because it’s easy to add another line of code that you intend to be in the same block (and you even indent it so it looks pretty)

but if, at that point, you don’t add the curly braces, you’re not going to end up with the code you think you wrote

this is, in fact, precisely what happened to Apple a couple years ago in the infamous "gotofail" bug in their SSL implementation

https://www.imperialviolet.org/2014/02/22/applebug.html

https://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2014-1266

this is NOT a dig at Apple or its engineers: this is a lesson in how consistency and discipline make you better programmers

if you get in the habit of always including curly braces, you will NEVER introduce bugs like this. full stop.


and the resultant assembly: if.s

looks pretty similar to what I showed before we were even considering C

we see the same "load operands, operate, store result" pattern

the only real surprise is this ".L2" thing, which appears both in the left-most column and as the target of the bge instruction

this is a label, used to make reading assembly easier

instead of making the reader calculate offsets manually, we use labels and leave the calculation up to the assembler

so we stick in a label showing where we want a successful branch to end up


and so we’ve discovered that the assembler doesn’t quite translate directly from assembly to machine code

it also translates these symbolic offsets (ie, non-literal) to the actual numbers (and therefore bits) that will appear in the machine code


another kind of unexpected thing

check out the instructions that calculate "a"

mov r3, r2
mov r3, r3, asl #3
add r3, r3, r2

it copies r2 to r3

then it takes r3, shifts it left by 3, and puts the result in r3

the effect of which is to multiply the contents of r3 by 8

then it adds r2 to r3

turns out these three instructions are more efficient than running it through the integer multiplication circuitry


while loop

stupid-simple example program: while.c

and the resultant assembly: while.s

no surprises here

again we see labels making our lives easier


for loop

for.c and for.s

how different is this to a while loop?

let’s ask Linux:

$ diff while.s for.s
14c14
<         .file   "while.c"
---
>         .file   "for.c"

this tells us that the only difference in the resultant assembly is the name of the original source file (which happens to be on line 14 of both assembly files)

otherwise they are exactly identical

this makes sense!

a for-loop is just a while-loop with explicit initialization and iteration steps

there’s a term for programming constructs that aren’t fundamentally new but are easier to read: syntactic sugar


now, to really mess with you

what is the value of i after this code runs?

int i = 1;
i += i++ + ++i;

"i++" is post-increment, which means the expression evaluates to i, and then i is incrememted

whereas "++i" is pre-increment, which means i is incremented and then the expression is evaluated

but here we’ve got an instance of each: which order do they happen in? this affects the result!

bad news: the C standard itself is ambiguous on this point

bottom line: don’t do it

programs should be written with clear intent and no surprises

there is NO benefit to taking shortcuts

especially in high-level languages! because it all gets compiled down to machine code anyway

Last modified: