Lecture 25 - Functions II

Published

April 11, 2026

Goals

  • Learn about the roles of all of the ARM32 registers
  • Learn what happens to LR when we have to call a second function
  • Learn what happens when we have more parameters than registers

Registers

I was asked about r12, since we have identified a couple of special registers: - r11: FP - r13: SP - r14: LR - r15: PC

For the others - R0-R3: arguments and result (R0) - R4-R10: general purpose registers (callee-saved) - R12: intra-procedure scratch - rarely used for linked in libraries

Importantly, r13-r15 are all hardware level special purpose registers. They are associated with specific instructions in the ISA, so they have to be wired specially. r11 and r12 are special, but that is determined by the compiler. Their roles are convention, but a different compiler could make a different choice.

Saving registers across calls

I listed r4-r10 as “callee saved”. What does that mean?

The way we think about calling a function is that we expect the system to go off, perform some computation and then return us to exactly where we were with the result. It should not affect our local environment at all. All of the code we have been looking at has been unoptimized and all of our local variables are stored on the stack and fetched from the stack for every call. More optimized code would make better use of our registers which are faster than main memory.

The downside of leaving variables in registers, however, is that the function we are calling may want to use the same registers. To preserve the values, they will need to be written into memory. There are a variety of strategies we can employ for this.

In our unoptimized approach, the source of truth for every variables is already in memory, so we don’t worry about it.

We can make it the responsibility of the caller to save any register values it would like when the function call completes. We see this for registers r0-r3 already since they will be used to pass values or return a value. The caller already knows that what is in there at the start of the call may not be there when it comes back. The general pattern with caller saved registers is that the first function saves the value on the stack before the function call is made, and then reads the value back into the register after the call.

The other policy is to make registers callee saved. The idea with these registers is that the first function can load a value into one of those registers and expect that the value wil still be there when the call returns. If the callee (the function being called) wants to use a callee saved register, it needs to preserve the value in it. The pattern will be to add the value of the register onto the stack at the start of the function. Use the register however we want. Then at the end, load the value back into the register before the function returns. To the caller, it will appear as if the value was never touched.

Multiple return addresses

If the return address is in a register don’t we have a problem if we call a second function from the first?

Yes, we certainly do. What should we do?
Save the return address on the stack

Let’s take a look at some code with multiple functions


int sum(int x, int y){
  int result;
  result = x + y;
  return result;

}

int do_math(int x, int y){
  return 2 * sum(x,y);
}


int main(int argc, char * argv[]){
  int a,b,c;
  a = 1;
  b = 2;
  c = do_math(a,b);
}

After we have produced the object code

$ arm-none-eabi-gcc -c func2.c
$ arm-none-eabi-objdump -d -j .text func2.o > func2.s

we get this code:


func2.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <sum>:
   0:   e52db004    push    {fp}        ; (str fp, [sp, #-4]!)
   4:   e28db000    add fp, sp, #0
   8:   e24dd014    sub sp, sp, #20
   c:   e50b0010    str r0, [fp, #-16]
  10:   e50b1014    str r1, [fp, #-20]  ; 0xffffffec
  14:   e51b2010    ldr r2, [fp, #-16]
  18:   e51b3014    ldr r3, [fp, #-20]  ; 0xffffffec
  1c:   e0823003    add r3, r2, r3
  20:   e50b3008    str r3, [fp, #-8]
  24:   e51b3008    ldr r3, [fp, #-8]
  28:   e1a00003    mov r0, r3
  2c:   e28bd000    add sp, fp, #0
  30:   e49db004    pop {fp}        ; (ldr fp, [sp], #4)
  34:   e12fff1e    bx  lr

00000038 <do_math>:
  38:   e92d4800    push    {fp, lr}
  3c:   e28db004    add fp, sp, #4
  40:   e24dd008    sub sp, sp, #8
  44:   e50b0008    str r0, [fp, #-8]
  48:   e50b100c    str r1, [fp, #-12]
  4c:   e51b100c    ldr r1, [fp, #-12]
  50:   e51b0008    ldr r0, [fp, #-8]
  54:   ebfffffe    bl  0 <sum>
  58:   e1a03000    mov r3, r0
  5c:   e1a03083    lsl r3, r3, #1
  60:   e1a00003    mov r0, r3
  64:   e24bd004    sub sp, fp, #4
  68:   e8bd4800    pop {fp, lr}
  6c:   e12fff1e    bx  lr

00000070 <main>:
  70:   e92d4800    push    {fp, lr}
  74:   e28db004    add fp, sp, #4
  78:   e24dd018    sub sp, sp, #24
  7c:   e50b0018    str r0, [fp, #-24]  ; 0xffffffe8
  80:   e50b101c    str r1, [fp, #-28]  ; 0xffffffe4
  84:   e3a03001    mov r3, #1
  88:   e50b3008    str r3, [fp, #-8]
  8c:   e3a03002    mov r3, #2
  90:   e50b300c    str r3, [fp, #-12]
  94:   e51b100c    ldr r1, [fp, #-12]
  98:   e51b0008    ldr r0, [fp, #-8]
  9c:   ebfffffe    bl  38 <do_math>
  a0:   e50b0010    str r0, [fp, #-16]
  a4:   e3a03000    mov r3, #0
  a8:   e1a00003    mov r0, r3
  ac:   e24bd004    sub sp, fp, #4
  b0:   e8bd4800    pop {fp, lr}
  b4:   e12fff1e    bx  lr

I want to take a look at the start of the do_math function

push    {fp, lr}

We already saw push for pushing the fp on the stack. This pushes both on. Notice how we pop both at the end as well.

Why do we sometimes pass the value in a register, and sometimes store it in the stack?

Registers are fast. We don’t have to go out to memory so we have a gain there, but the memory used to make registers itself is also much faster to read and write.

parameters revisited

We have a similar issue with the parameters

We saw earlier that the parameters were being passed in registers, but what happens when we have more parameters than registers?

int f(int a, int b, int c, int d, int e, int f, int g, int h)
{
  int result =  a + b + c + d + e + f + g + h;

  return result;
}

int main(int argc, char *argv[])
{

  int c = f(0, 1, 2, 3, 4, 5, 6, 7);

  return c;
}
00000000 <f>:
   0:   e52db004    push    {fp}        ; (str fp, [sp, #-4]!)
   4:   e28db000    add fp, sp, #0
   8:   e24dd014    sub sp, sp, #20
   c:   e50b0008    str r0, [fp, #-8]
  10:   e50b100c    str r1, [fp, #-12]
  14:   e50b2010    str r2, [fp, #-16]
  18:   e50b3014    str r3, [fp, #-20]  ; 0xffffffec
  1c:   e51b2008    ldr r2, [fp, #-8]
  20:   e51b300c    ldr r3, [fp, #-12]
  24:   e0822003    add r2, r2, r3
  28:   e51b3010    ldr r3, [fp, #-16]
  2c:   e0822003    add r2, r2, r3
  30:   e51b3014    ldr r3, [fp, #-20]  ; 0xffffffec
  34:   e0822003    add r2, r2, r3
  38:   e59b3004    ldr r3, [fp, #4]
  3c:   e0822003    add r2, r2, r3
  40:   e59b3008    ldr r3, [fp, #8]
  44:   e0822003    add r2, r2, r3
  48:   e59b300c    ldr r3, [fp, #12]
  4c:   e0822003    add r2, r2, r3
  50:   e59b3010    ldr r3, [fp, #16]
  54:   e0823003    add r3, r2, r3
  58:   e1a00003    mov r0, r3
  5c:   e28bd000    add sp, fp, #0
  60:   e49db004    pop {fp}        ; (ldr fp, [sp], #4)
  64:   e12fff1e    bx  lr

00000068 <main>:
  68:   e92d4800    push    {fp, lr}
  6c:   e28db004    add fp, sp, #4
  70:   e24dd020    sub sp, sp, #32
  74:   e50b0010    str r0, [fp, #-16]
  78:   e50b1014    str r1, [fp, #-20]  ; 0xffffffec
  7c:   e3a03007    mov r3, #7
  80:   e58d300c    str r3, [sp, #12]
  84:   e3a03006    mov r3, #6
  88:   e58d3008    str r3, [sp, #8]
  8c:   e3a03005    mov r3, #5
  90:   e58d3004    str r3, [sp, #4]
  94:   e3a03004    mov r3, #4
  98:   e58d3000    str r3, [sp]
  9c:   e3a03003    mov r3, #3
  a0:   e3a02002    mov r2, #2
  a4:   e3a01001    mov r1, #1
  a8:   e3a00000    mov r0, #0
  ac:   ebfffffe    bl  0 <f>
  b0:   e50b0008    str r0, [fp, #-8]
  b4:   e3a03000    mov r3, #0
  b8:   e1a00003    mov r0, r3
  bc:   e24bd004    sub sp, fp, #4
  c0:   e8bd4800    pop {fp, lr}
  c4:   e12fff1e    bx  lr

We can see that r0-r3 were used for the first three values. Beyond that they are pushed on the stack

What’s more these are being added within main‘s’ stack frame. At the beginning you can see that we are reserving a good sized chunk for main on the stack

The other interesting thing is that there is no “unpacking” of the values in memory. Instead, the function is reaching in and accessing the values directly with a positive offset from the fp. The compiler is fulfilling a contract that the last items in the caller’s frame will be the values we want access to.

Mechanical level

vocabulary

Skills