Lecture 20 - C data structures

Published

April 1, 2026

Goals

  • Learn about the basic types in C
  • Look at how arrays are handled in assembly
  • Learn about the struct type

source code

Building an x86 executable

For the above, we are using a special tool to produce ARM assembly, and we will continue to use that

However, it is worth taking a moment to talk about building something we can actually run

Let’s start with something closer to the starter program:

#include <stdio.h>

int main(int argc, char* argv[]){
  printf("Don't Panic!\n");
  printf("%d\n", 42);
}

There are a couple of new things in here. - in order to print things out we use a special function called printf (the f is for “formatted”) - in order to use printf, we have to include the standard IO library stdio (we will talk more about including and .h files later) - note that we need to specify the new line character when we print - the printf function takes an arbitrary number of arguments which are substituted in for the substitution codes in the string (these all start with %)

If we are targeting x86, we can just use

$ gcc -S -masm=intel starter.c
$ gcc starter.s -o starter
$ ./starter
  • -S just run the assembler
  • -masm=intel set the assembly “flavor” to Intel instead of the default AT&T
  • -o starter produce a binary file called starter

gcc is also smart enough to do the whole process in one step

$ gcc starter.c -o starter
$ ./starter

Storing data

basic data types

Java

byte - 1 byte
short - 2 bytes
int - 4 bytes
long - 8 bytes
float - 4 bytes
double - 8 bytes

C

char - 1 byte
short - at least two bytes
int - at least two bytes (and guaranteed to be at least the size of a short)
long - at least 4 bytes
long long - at least 8 bytes
float - 4 bytes
double - 8 bytes

For the integer types, you can also add unsigned in front of them to make them unsigned (otherwise they are interpreted as two’s compliment numbers)

Java types sizes are well defined because they have control of the architecture The C standard is looser since it covered a large number of different architectures. It has largely settled down, but there are no guarantees and you need to look at what your compiler emits

You can find the limits in /usr/include/limits.h or by using sizeof()

On basin,

char: 1 bytes
    -128 - 127 [0X80 - 0X7F]
unsigned char: 1 bytes
    0 - 255 [0 - 0XFF]
short: 2 bytes
    -32768 - 32767 [0X8000 - 0X7FFF]
unsigned short: 2 bytes
    0 - 65535 [0 - 0XFFFF]
int: 4 bytes
    -2147483648 - 2147483647  [0X80000000 - 0X7FFFFFFF]
unsigned int: 4 bytes
    0 - 4294967295  [0 - 0XFFFFFFFF]
long: 8 bytes
    -9223372036854775808 - 9223372036854775807  [0X8000000000000000 - 0X7FFFFFFFFFFFFFFF]
long: 8 bytes
    0 - 18446744073709551615  [0 - 0XFFFFFFFFFFFFFFFF]
long long: 8 bytes
    -9223372036854775808 - 9223372036854775807  [0X8000000000000000 - 0X7FFFFFFFFFFFFFFF]
unsigned long long: 8 bytes
    0 - 18446744073709551615  [0 - 0XFFFFFFFFFFFFFFFF]

We can take a look inside of sizes.c to see some more options for formatting in printf

I we want more control, we have some more specific types like int32_t, uint16_t, etc, that allow us to have more precise control over the size of our data

These can be found in /usr/include/bits/stdint-intn.h and /usr/include/bits/stdint-uintn.h

Data structures

let’s look at some assembly and see if we can figure out what it does

        mov     r3, #0
        str     r3, [fp, #-8]
        b       .L2
.L3:
        ldr     r3, [fp, #-8]
        lsl     r3, r3, #2          ; logical shift left (or just x4)
        sub     r3, r3, #4
        add     r3, r3, fp
        ldr     r2, [fp, #-8]
        str     r2, [r3, #-24]
        ldr     r3, [fp, #-8]
        add     r3, r3, #1
        str     r3, [fp, #-8]
.L2:
        ldr     r3, [fp, #-8]
        cmp     r3, #4
        ble     .L3

We can tell by the branches that we have a loop

The condition is with respect to r3, so we can start just looking at that - it starts as 0 - we are continuing while it is less than or equal to 4 - it is stored in fp -8 - it is being incremented every loop

So… for loop Let’s call the variable in fp-8 i

let’s work backwards for the second part str r2, [r3, #-24] - we are storing whatever is in r2 into r3 - 24 ldr r2, [fp, #-8] - just before we loaded [fp-8] into r2; that’s the value of i

So, what is in r3?

ldr     r3, [fp, #-8]     ; r3 ← i
lsl     r3, r3, #2        ; r3 ← r3*4  (i*4)
sub     r3, r3, #4        ; r3 ← r3 - 4 (i*4 - 4)
add     r3, r3, fp        ; r3 ← r3 + fp (fp -4 + i*4)

If we combine that with the -24, we are storing things into \(\text{fp} -4 - 24 + i \times 4\) or \(\text{fp} - 28 + i \times 4\)

To abstract that a little bit, we are talking about \(\text{some memory address} + i \times 4\)

We are storing integers, which we know are 4 bytes, so we are really talking about adjacent memory locations.

What we have is assignments into an array

Mechanical level

vocabulary

Skills