CS 202 Lecture 24 – functions I

pete > courses > CS 202 Spring 24 > Lecture 24: functions I

Lecture 24: functions I

Goals

enumerate the contents of an activation record/stack frame
describe how these contents allow us to implement functions
describe the purpose and behavior of The Stack
describe the meaning of the stack pointer and function pointer
use gcc to compile to machine code
use objdump to disassemble machine code

functions

start with intuition, then move on to specifics

starting with our stupid-simple example: func.c

some things are the same every time; namely, the instructions that get executed

but what’s (potentially) different every time?

parameters
local variables
return value
where to return to (ie, your function could be called from both line 20 and line 40; when it returns, it needs to return to either line 21 or line

so if you have a function main() and it calls a function foo(), something is going to have to remember all those things

main() is going to have to put the parameters someplace foo() knows to get at them

the compiler is going to need to make enough space for foo()’s local variables

foo() is going to need to put its return value someplace where main() knows to get at it

and finally, when foo() is done, we’re going to have to resume execution back where we left off in main()

on that last point, "where to resume execution" is just an instruction

an instruction in memory, because we’re working with a Von Neumann architecture

so the easiest way to identify a specific instruction in memory is therefore by its address

thus we call this fourth piece of info the return address

wrench in the works: what if foo() itself calls another function? (call it bar())

we need to remember all this same information for that invocation, too

but we also need to keep around all the information about foo()’s incovation

because when bar() returns, foo() is going to need to be able to access its parameters, local variables, etc

and most especially it’s going to need to return to the right place in main()

conclusion: every time a function is called, we need to gather this information and put it somewhere

we could have a bunch of sets of this information, too, for each function currently "in progress"

(ie, if main() calls foo() calls bar(), we’ll need this info for each of main(), foo(), and bar())

lets give this chunk of information a name

we’ll call it an activation record, because it’s the record of a function being activated

here are some interesting properties

once we return from a function, its activation record is meaningless

and at any given point, we only care about the most-recent activation record

thus when a new function is called, its activation record is going to be the most important

do you know of any data structures where we add stuff on one end, take stuff off that same end, and only ever look at the thing on the end?

a stack!

we could have a stack of activation records

where calling a function results in a new record being pushed onto the stack

and returning from a function results in the topmost record being popped (ie, disappearing)

so we want a stack

where should the constituent data live?

only one real choice here: main memory

in fact, part of main memory is set aside specifically for the stack of activation records

furthermore, it’s so pervasive that this area of memory is called, simply, The Stack

furthermore furthermore, another term for "activation record" is stack frame

so what does a stack frame look like?

bad news: there are no assembly language instructions that explicitly delineate the various data in the stack frame: it’s all implicit

so let’s go to the assembly and see if we can figure it out

I’m going to use a slightly different method to produce the assembly, because I think the assembly this method produces is clearer for our purposes today

decide for yourself which you prefer

first I’m going to compile func.c to machine code (the "-c" says "compile to machine code but don’t try to make an executable"—because a full-blown executable has a lot of other boilerplate that’s just going to get in the way today):

$ arm-none-eabi-gcc -c func.c

this produces a file called func.o (.o for "object", the old-school term for binary/machine code: "object code")

then I’m going to disassemble the machine code:

$ arm-none-eabi-objdump -d -j .text func.o > func.s

normally, objdump will just print its results to the screen; the "> func.s" bit will cause those results to be put in the file func.s instead

(the "-j .text" part tells objdump to focus on the .text section, which is what contains the actual instructions; there are other sections that contain, eg, initialized data like strings, which we’ll see more of later)

func.s

the left-most column is offset within the file

the next column is the machine code

let’s look at foo() first

we can infer that…

bar (a parameter) is stored at fp-16
baz (a local variable) is stored at fp-8
the return value is stored in r0

the last one is tricky: we can infer this because of this pair of instructions at the end:

ldr r3, [fp, #-8]
mov r0, r3

considering that we decided that The Stack is stored in memory, we can now start to make sense of this sp and fp nonsense

everything stored on the stack uses the contents fp as the base for the memory address calculation

and at the beginning of the function, fp gets the value of sp (and then sp is changed: we’ll get to that in a sec)

this implies that these two registers hold the memory address of (part of) The Stack

sp is a register (r13, to be precise) that contains the memory address of the last used word (32-bit value) in the stack

(incidentally, setting the initial value for the stack pointer is part of the work the operating system performs in running a program)

it’s otherwise known as the "stack pointer"

fp is another register (r11, to be precise) that contains the memory address of the first word in the current stack frame

it’s otherwise known as the "frame pointer"

with that in mind, let’s look at the first few instructions of foo()

push    {fp}        ; (str fp, [sp, #-4]!)

the instruction is "push {fp}", which objdump helpfully tells us is exactly equivalent to the instruction after the semi-colon

it’s storing the current value of the frame pointer (which will still point to the calling function’s frame) at sp-4

because we’re in a brand-new function, we can assume it’s storing this after all the existing stack frames

thus we can infer that newer stack frames have addresses that are more negative than older stack frames (because the new frame is in addresses that are the result of subtracting from sp)

conclusion: much like the enemy gate, the stack grows down

the exclamantion point (frequently enunciated as "bang") says to set sp to sp-4 after the instruction executes

thus this instructions effects are two-fold:

Mem[sp-4] <- fp
sp <- sp - 4

(this makes sense: we’ve pushed a 4-byte value—fp—onto the stack and now we need to maintain the property that sp points to the last used value)

add fp, sp, #0

the new frame starts at the last used word of the stack (which now happens to contain the calling function’s frame pointer)

sub sp, sp, #20

subtract 20 from the stack pointer: making room on The Stack for the new stack frame

str r0, [fp, #-16]

from the following code, we can infer that fp-16 stores the parameter, baz

thus from this instruction, we can infer that the caller passed the parameter to the callee (ie, foo()) in r0

that is, if we go back to the code for main, we should see that it fills r0 with the value it’s passing to foo()

and we see precisely that:

  1c:   e51b0008    ldr r0, [fp, #-8]

(convince yourself this is the case)

so that’s the (implicit) creation of the stack frame

we’ll look at more on Wednesday:

how functions get called
how functions return
how parameters are passed
how room is made for local variables
finally, a map to the stack frame