CS 202 Lecture 25 – functions II

pete > courses > CS 202 Spring 24 > Lecture 25: functions II

Lecture 25: functions II

Goals

describe the layout of a stack frame in gcc
describt the behavior of the PUSH and POP instructions
describe the purpose and behavior of the BL and BX instructions

to review

when a function is called, we need to keep track of a bunch of information

we call the construct used to store this information an activation record

when a function is called, a new activation record is created that contains:

parameter values
local variables
return address
return value (sorta)
address of the caller’s stack frame

when a function returns, its activation record is no longer relevant

thus we always add or remove activation records from the "end" of the list, which looks a heck of a lot like a stack

thus, when we’re talking about computer systems, the stack of activation records is referred to as The Stack

and another name for an activation record is a stack frame

also, the stack grows down (ie, newer frames are stored at lower memory addresses than older frames)

we were introduced to two new registers: sp and fp

sp, the stack pointer, contains the memory address of the last used word (ie, 32-bit chunk) of memory on the stack

fp, the frame pointer, contains the memory address of the first word of the current stack frame

last time, we saw this function

int foo(int bar)
{
    int baz;
    baz = bar * 9;
    return baz;
}

result in this stack frame

+---------------+
|  caller's fp  |   <---- fp
+---------------+
|               |
+---------------+
|  baz (local)  |
+---------------+
|               |
+---------------+
|  bar (param)  |
+---------------+
|               |   <---- sp
+---------------+

let’s now look at the actual instruction that causes the function to be called

bl 48 <foo>

bl: "branch with link"

recall that branch instructions contain offsets: we calculate the address of the next instruction to execute by adding the offset to the current PC

thus the offset is relative to the PC rather than absolute (the latter meaning that the value needs no context to interpret: it is what it is)

here, objdump has already calculated the target of the branch relative to the value of the PC at this instruction and is telling us the absolute target

the target is the instruction at address 48, which happens to be the first instruction in the function "foo"

at the very least, the bl instruction will change the PC to the target address

but it also saves PC + 4 in the lr register

note that this is the address of the instruction that will be executed immediately after the function ("foo", in this case) returns

that is, it’s the breadcrumb that will let "foo" return to the right place when it’s done

to summarize, the bl instruction has two effects:

LR <- PC + 4

PC <- PC + 4 + sign-extend(offset)

the latter is the "branch" part, the former is the "link" part, hence the name

so… what do you think bx lr does?

this one is simple (relatively)

it branches to the address contained in the lr register

it’s effectively "mov pc lr"

now we can call functions and return from them

but our picture of the stack frame is still a bit thin

I’ve got a few programs to help us figure it out more clearly

first: double-func.c & double-func.s

note that the very first instruction of foo is different!

it used to be "push {fp}" but now it’s "push {fp, lr}"

this is because foo itself now calls another function, and when it calls that function, lr will be overwritten

thus is needs to save lr off into the stack frame

we walked through the behavior of the (single-register version of the) push instruction last time, so it makes sense now to consider how the push instruction works in general

conceptually, it pushes the contents of some collection of registers onto the stack

this involves copying the contents of the those registers onto the end of the stack and then changing the value of the stack pointer (sp) to point to the new end of the stack

so push {fp} will copy the value in the fp register onto the end of the stack and then decrement sp by 4 (so that sp now contains the address of the value at the end: that is, the contents of fp that were just put there)

alternatively, push {fp, lr} will copy the value in the fp register onto the end of the stack, and then copy the value in the lr register onto the stack immediately after where fp just ended up, and then decrement sp by 8 (because each register value is 4 bytes, we copied 2 values, therefore the stack is now 8 bytes longer)

tracing through the rest of the assembly code lets us deduce the remainder of the stack frame:

+---------------+
|  caller's fp  |   <---- fp
+---------------+
|  return addr  |
+---------------+
|   a (local)   |
+---------------+
|   b (local)   |
+---------------+
|   c (local)   |
+---------------+
|               |
+---------------+
|   p1 (param)  |
+---------------+
|               |   <---- sp
+---------------+

aha! we’ve filled in one of the blank spaces!

I have no idea what goes in the other blank space

(sorry)

one other thing to note

check out the third instruction in the function

sub sp, sp, #24

previously, we were only subtracting 20 from sp, but now we’re subtracting 24; why?

because now foo has more local variables!

the compiler has observed that the stack frame needs to be bigger, and thus produced this instruction to cause it to be so

before we get crazy with stack frames, let’s look at the pop instruction at the end: pop {fp, lr}

conceptually, it should do the exact opposite of the push instruction described above: it should remove values from the end of the stack, put them in the named registers, and update sp to point to the new end of the stack

and this is exactly what it does

remember that sp points to the last used word on the stack

therefore, pop {fp, lr} is equivalent to:

lr <- Mem[sp]
fp <- Mem[sp-4]
sp <- sp - 8

the function we’ve been working with has a single parameter

and, as we saw, this parameter was passed in r0 (even though it was immediately copied off into the stack frame by the callee)

(oh, yes, vocabulary: the caller is the function doing the calling and the callee is the function being called)

but we’ve got a finite number of registers: how do we deal with functions that have more parameters than we have registers?

let’s see!

func8param.c & func8param.s

I’ve written this program to make it easy to figure out where each parameter is being passed

comparing the C to the assembly, we can see that the zeroth through third parameters are in r0 through r3, respectively

and the fourth through seventh are put on the stack above the stack pointer

that is, within main’s stack frame!

on that note, check out the third instruction in main():

sub sp, sp, #32

even though main() has only one local variable, the compiler is making a fairly large stack frame for it, precisely for the purpose of storing these parameters on their way to foo()

now at the beginning of foo

we save off r0-r3 at the predictable locations in memory: fp-16 through fp-28

but we apparently don’t do anything with the last four parameters

in fact, we can see later on that we refer to those parameters as positive offsets from fp

which tells us the stack looks like this

+---------------+
|  h (param)    |
+---------------+
|  g (param)    |
+---------------+
|  f (param)    |
+---------------+
|  e (param)    |
+---------------+
|  caller's fp  |   <---- fp
+---------------+
|               |
+---------------+
|  y (local)    |
+---------------+
|               |
+---------------+
|  a (param)    |
+---------------+
|  b (param)    |
+---------------+
|  c (param)    |
+---------------+
|  d (param)    |
+---------------+
|               |   <---- sp
+---------------+

why might we want to pass parameters in registers and avoid memory?

what’s memory made of? DRAM

what are registers made of? SRAM

which is faster? SRAM

passing parameters in registers is waaaaay faster than messing around with memory

just to be complete, let’s look at how a large number of local variables are handled

func8local.c & func8local.s

no surprises

the compiler makes room for a big stack frame (sub sp, sp, #52)

then the locals end up from fp-48 to fp-8

to sum up

the bl instruction is used to call functions: it branches and puts the return address in the lr register

the bx lr' instruction is used to return from functions: it sets the PC to whatever is stored in thelr` register

the first thing a function does is make room for its stack frame

first goes the caller’s frame pointer (fp) so it can be restored at the end

then comes the return address (unless this function doesn’t call any other functions, in which case we just leave it in lr)

then come the local variables

then an empty word

then the first four paramters

any other parameters are stored at the end of the caller’s stack frame

to officialize some previously-mentioned vocab:

the function prologue is the set of boilerplate instructions that set up the stack frame and come before the instructions that actually implement the body of the function

the function epilogue is the set of boilerplate instructions that come after the instructions that actually implement the body of the function and deal with getting the return value/return address in the right place before calling bx

next: we’ve only used integers as parameters to this point; that’s boring!

how are things like arrays and structs passed to functions?

and we haven’t even mentioned character strings!