pete > courses > CS 202 Spring 24 > Lecture 25: functions II
Lecture 25: functions II
Goals
- describe the layout of a stack frame in gcc
- describt the behavior of the PUSH and POP instructions
- describe the purpose and behavior of the BL and BX instructions
to review
when a function is called, we need to keep track of a bunch of information
we call the construct used to store this information an activation record
when a function is called, a new activation record is created that contains:
- parameter values
- local variables
- return address
- return value (sorta)
- address of the caller’s stack frame
when a function returns, its activation record is no longer relevant
thus we always add or remove activation records from the "end" of the list, which looks a heck of a lot like a stack
thus, when we’re talking about computer systems, the stack of activation records is referred to as The Stack
and another name for an activation record is a stack frame
also, the stack grows down (ie, newer frames are stored at lower memory addresses than older frames)
we were introduced to two new registers: sp and fp
sp, the stack pointer, contains the memory address of the last used word (ie, 32-bit chunk) of memory on the stack
fp, the frame pointer, contains the memory address of the first word of the current stack frame
last time, we saw this function
int foo(int bar) { int baz; baz = bar * 9; return baz; }
result in this stack frame
+---------------+ | caller's fp | <---- fp +---------------+ | | +---------------+ | baz (local) | +---------------+ | | +---------------+ | bar (param) | +---------------+ | | <---- sp +---------------+
let’s now look at the actual instruction that causes the function to be called
bl 48 <foo>
bl: "branch with link"
recall that branch instructions contain offsets: we calculate the address of the next instruction to execute by adding the offset to the current PC
thus the offset is relative to the PC rather than absolute (the latter meaning that the value needs no context to interpret: it is what it is)
here, objdump has already calculated the target of the branch relative to the value of the PC at this instruction and is telling us the absolute target
the target is the instruction at address 48, which happens to be the first instruction in the function "foo"
at the very least, the bl instruction will change the PC to the target address
but it also saves PC + 4 in the lr register
note that this is the address of the instruction that will be executed immediately after the function ("foo", in this case) returns
that is, it’s the breadcrumb that will let "foo" return to the right place when it’s done
to summarize, the bl instruction has two effects:
LR <- PC + 4
PC <- PC + 4 + sign-extend(offset)
the latter is the "branch" part, the former is the "link" part, hence the name
so… what do you think bx lr does?
this one is simple (relatively)
it branches to the address contained in the lr register
it’s effectively "mov pc lr"
now we can call functions and return from them
but our picture of the stack frame is still a bit thin
I’ve got a few programs to help us figure it out more clearly
first: double-func.c & double-func.s
note that the very first instruction of foo is different!
it used to be "push {fp}" but now it’s "push {fp, lr}"
this is because foo itself now calls another function, and when it calls that function, lr will be overwritten
thus is needs to save lr off into the stack frame
we walked through the behavior of the (single-register version of the) push instruction last time, so it makes sense now to consider how the push instruction works in general
conceptually, it pushes the contents of some collection of registers onto the stack
this involves copying the contents of the those registers onto the end of the stack and then changing the value of the stack pointer (sp) to point to the new end of the stack
so push {fp} will copy the value in the fp register onto the end of the stack and then decrement sp by 4 (so that sp now contains the address of the value at the end: that is, the contents of fp that were just put there)
alternatively, push {fp, lr} will copy the value in the fp register onto the end of the stack, and then copy the value in the lr register onto the stack immediately after where fp just ended up, and then decrement sp by 8 (because each register value is 4 bytes, we copied 2 values, therefore the stack is now 8 bytes longer)
tracing through the rest of the assembly code lets us deduce the remainder of the stack frame:
+---------------+ | caller's fp | <---- fp +---------------+ | return addr | +---------------+ | a (local) | +---------------+ | b (local) | +---------------+ | c (local) | +---------------+ | | +---------------+ | p1 (param) | +---------------+ | | <---- sp +---------------+
aha! we’ve filled in one of the blank spaces!
I have no idea what goes in the other blank space
(sorry)
one other thing to note
check out the third instruction in the function
sub sp, sp, #24
previously, we were only subtracting 20 from sp, but now we’re subtracting 24; why?
because now foo has more local variables!
the compiler has observed that the stack frame needs to be bigger, and thus produced this instruction to cause it to be so
before we get crazy with stack frames, let’s look at the pop instruction at the end: pop {fp, lr}
conceptually, it should do the exact opposite of the push instruction described above: it should remove values from the end of the stack, put them in the named registers, and update sp to point to the new end of the stack
and this is exactly what it does
remember that sp points to the last used word on the stack
therefore, pop {fp, lr} is equivalent to:
lr <- Mem[sp] fp <- Mem[sp-4] sp <- sp - 8
the function we’ve been working with has a single parameter
and, as we saw, this parameter was passed in r0 (even though it was immediately copied off into the stack frame by the callee)
(oh, yes, vocabulary: the caller is the function doing the calling and the callee is the function being called)
but we’ve got a finite number of registers: how do we deal with functions that have more parameters than we have registers?
let’s see!
I’ve written this program to make it easy to figure out where each parameter is being passed
comparing the C to the assembly, we can see that the zeroth through third parameters are in r0 through r3, respectively
and the fourth through seventh are put on the stack above the stack pointer
that is, within main’s stack frame!
on that note, check out the third instruction in main():
sub sp, sp, #32
even though main() has only one local variable, the compiler is making a fairly large stack frame for it, precisely for the purpose of storing these parameters on their way to foo()
now at the beginning of foo
we save off r0-r3 at the predictable locations in memory: fp-16 through fp-28
but we apparently don’t do anything with the last four parameters
in fact, we can see later on that we refer to those parameters as positive offsets from fp
which tells us the stack looks like this
+---------------+ | h (param) | +---------------+ | g (param) | +---------------+ | f (param) | +---------------+ | e (param) | +---------------+ | caller's fp | <---- fp +---------------+ | | +---------------+ | y (local) | +---------------+ | | +---------------+ | a (param) | +---------------+ | b (param) | +---------------+ | c (param) | +---------------+ | d (param) | +---------------+ | | <---- sp +---------------+
why might we want to pass parameters in registers and avoid memory?
what’s memory made of? DRAM
what are registers made of? SRAM
which is faster? SRAM
passing parameters in registers is waaaaay faster than messing around with memory
just to be complete, let’s look at how a large number of local variables are handled
no surprises
the compiler makes room for a big stack frame (sub sp, sp, #52)
then the locals end up from fp-48 to fp-8
to sum up
the bl instruction is used to call functions: it branches and puts the return address in the lr register
the bx lr' instruction is used to return from functions: it sets the PC to whatever is stored in thelr` register
the first thing a function does is make room for its stack frame
first goes the caller’s frame pointer (fp) so it can be restored at the end
then comes the return address (unless this function doesn’t call any other functions, in which case we just leave it in lr)
then come the local variables
then an empty word
then the first four paramters
any other parameters are stored at the end of the caller’s stack frame
to officialize some previously-mentioned vocab:
the function prologue is the set of boilerplate instructions that set up the stack frame and come before the instructions that actually implement the body of the function
the function epilogue is the set of boilerplate instructions that come after the instructions that actually implement the body of the function and deal with getting the return value/return address in the right place before calling bx
next: we’ve only used integers as parameters to this point; that’s boring!
how are things like arrays and structs passed to functions?
and we haven’t even mentioned character strings!