Lecture 22 - Memory II

Published

April 6, 2026

Goals

  • Learn about endianess
  • Learn how different types of arrays are stored in memory
  • Learn about multi-dimensional arrays in C

Exploring memory

I could demonstrate this by looking at the assembly code, but it is actually easier to just get the program to tell us where things are stored.

We need two new things:

  • & - the & operator can be placed in front of any variable and it will return the address of the variable in memory
  • %p - this is the format specifier we need to print out addresses
printf("u.x is stored at %p\nu.f is stored at %p\n", &u.x, &u.f);

Running this:

u.x is stored at 0x7ffefde1034c
u.f is stored at 0x7ffefde1034c

Note that the number will change if you run this again

Let’s try with a collection of variables:

#include <stdio.h>
#include <stdint.h>

int main(int argc, char *argv[])
{
    int x;
    char c;
    uint16_t i;
    int32_t y;

    printf("x is stored at %p\n", &x);
    printf("c is stored at %p\n", &c);
    printf("i is stored at %p\n", &i);
    printf("y is stored at %p\n", &y);
}
$ gcc locations.c -o locations
$ ./locations
x is stored at 0x7fffa12f3c9c
c is stored at 0x7fffa12f3c9b
i is stored at 0x7fffa12f3c98
y is stored at 0x7fffa12f3c94

If we map that out, the memory layout looks like this:

0x7fffa12f3c94 y
0x7fffa12f3c98 i   c
0x7fffa12f3c9c x

There are two things to notice here

  • the ordering is going from bottom to top (we have seen this before and we will learn the reason shortly)
  • there is a gap in there

Why is there a gap? We want to make sure that our values are aligned

Let’s start by reviewing the memory layout that we saw last time at the end of lecture 21

endianess

One thing that you have probably not given any thought to is what part of the number the address is pointing to – we have byte level addressing after all

A reasonable base assumption is probably that the number is stored the way we write it and the address points to the top byte of the value

As it turns out, that is one of two common choices The other choice is to rearrange the number so the lowest byte comes first. So, instead of \(B_3B_2B_1B_0\), the number is stored \(B_0B_1B_2B_3\) . The first of these is called “big endian” and the second is called “little endian” (they are references to Gulliver’s Travels)

endianness - the order that bytes of a value are stored or transmitted

An important, but somewhat confusing point here is that we aren’t just reversing the numbers, the bits are still stored left to right within the bytes

Example: number: 0x12345678 big endian: 0x12345678 little endian: 0x78563412

This is typically a hardware level choice (though some machines can swap)

Since it is all just wiring, there isn’t really a way for us to tell by looking at a value in our code how it was stored (and this is a good thing!). Even bit-wise operators will perform as expected

However, we can use union to trick the system into telling us. We put an integer in one field, and a char array in the other. This will allow us to look at the individual bytes of the number

When we run my test, we get back:

Little Endian
Apparent storage: 12 34 56 78
Actual storage: 78 56 34 12

Why do we care if numbers just work and it is invisible to us?

Most personal computers are little endian. However, networking uses big endian for transmission. So, while we may not care if we just do data processing on our own machine, if we start looking at raw network traffic, this distinction becomes important.

Note that this is only if you aren’t using a higher level abstraction on the data you are sending. If you send data as text (like a JSON file) then you will have libraries that handle it for you. It is only when you are getting a blob of bytes that this matters.

Arrays

Let’s take our earlier example and instead of loading the array, we will just print out the memory locations:


#include <stdio.h>

int main(int argc, char * argv[]){
    int a[5];

    for (int i = 0; i < 5; i++){
        printf("a[%d]: %p\n", i, &a[i]);
    }
}

The output looks like this:

a[0]: 0x7ffd37c33f10
a[1]: 0x7ffd37c33f14
a[2]: 0x7ffd37c33f18
a[3]: 0x7ffd37c33f1c
a[4]: 0x7ffd37c33f20

That shouldn’t be very surprising – we already knew that the values were adjacent in memory. We also knew that ints were 32 bits, so it makes sense that they are all 4 away from each other.

If we change the type of the array to uint8_t (and include stdint.h) then we get

a[0]: 0x7ffff3a89fa7
a[1]: 0x7ffff3a89fa8
a[2]: 0x7ffff3a89fa9
a[3]: 0x7ffff3a89faa
a[4]: 0x7ffff3a89fab

Again, that shouldn’t be that surprising

Multi-dimensional arrays

We can declare multi-dimensional arrays by just adding more brackets

int a[10][15];

Let’s take a look at their addresses:

int main(int argc, char * argv[]){
    int a[2][4];

    for (int j =0; j < 2; j++){
        for (int i =0; i < 4; i++){
            printf("a[%d][%d]: %p\n",j, i, &a[j][i]);
        }
    }}

Before we get to looking at where everything is in memory, I want to address a style issue.

As a general rule, we should be leery of “magic numbers”

magic numbers - raw, usually unexplained, values in our code

There are two primary issues with these values

  • They make the code harder to read because the reader doesn’t know why the value is there or what it does (these are sometimes accompanied by comments like “don’t touch or everything breaks”)
  • If the value is duplicated, as it is above, it can create headaches when we want to update a value. Perhaps we remember to update one, but not the other, or perhaps we update both, and they weren’t actually connected…

So don’t do it.

A better solution is to introduce constants that provide a name and a single place where the value can be changed.

In C we can do that with #define

#define COLS 4
#define ROWS 2

int main(int argc, char * argv[]){
    int a[ROWS][COLS];

    for (int j =0; j < ROWS; j++){
        for (int i =0; i < COLS; i++){
            printf("a[%d][%d]: %p\n",j, i, &a[j][i]);
        }
    }
}

note the different syntax

  • there is no assignment operator
  • there is no semicolon

#define is really a compiler directive rather than a statement in the core language. Before it does the actual compiling, it does a substitution in the code, so it is never “executed” as an actual instruction.

Mechanical level

vocabulary

  • endianess
  • magic numbers

Skills