CS 202 Lecture 30 – arrays and pointer arithmetic

pete > courses > CS 202 Spring 24 > Lecture 30: arrays and pointer arithmetic

Lecture 30: arrays and pointer arithmetic

Goals

describe how arrays are laid out in memory
describe what an array variable represents
use pointers to traverse arrays
perform pointer arithmetic on values of different types
use NULL to represent a pointer with value 0
access command-line arguments in C

consider this program: array.c

where is a stored?

the stack, because it’s one of main’s local variables

how much space does a take up?

40 bytes, because an int takes 4 bytes and a consists of 10 of them

where are these 10 ints relative to one another?

it makes sense that they would come one after the other in memory, and this is, in fact, the case

we have a situation like this (where each box stores a byte and lower-numbered addresses are at the bottom):

|       |   |
+-------+   |
|       |   /
+-------+
|       |   \
+-------+   |
|       |   |
+-------+   | a[2]
|       |   |
+-------+   |
|       |   /
+-------+
|       |   \
+-------+   |
|       |   |
+-------+   | a[1]
|       |   |
+-------+   |
|       |   /
+-------+
|       |   \
+-------+   |
|       |   |
+-------+   | a[0]
|       |   |
+-------+   |
|       |   /
+-------+

what if I change the array:

char a[10];

what changes?

the array is only going to consume 10 bytes, because a character is a single byte (and there are still 10 of them)

but the elements will still be contiguous:

|       |   ...
+-------+
|       |   a[5]
+-------+
|       |   a[4]
+-------+
|       |   a[3]
+-------+
|       |   a[2]
+-------+
|       |   a[1]
+-------+
|       |   a[0]
+-------+

now what if I change the array to containing structs?

struct point a[10];

since each struct point consists of two integers, the whole array will comprise 80 bytes

and they’ll be laid out something like this:

|       |   ...
+-------+
|       |   a[3].x
+-------+
|       |   a[2].y
+-------+
|       |   a[2].x
+-------+
|       |   a[1].y
+-------+
|       |   a[1].x
+-------+
|       |   a[0].y
+-------+
|       |   a[0].x
+-------+

with all this in mind, what is the variable a?

we know that a[0] is the zeroth item, a[1] is the first, etc

but what is a?

one can think of it as identifying the chunk of memory in which the array elements are stored

and the easiest way to identify a chunk of memory is by its address

let’s test this hypothesis: pointers-and-arrays.c

the for-loop fills the array

then we cause ip to point to the first element of the array:

ip = &(a[0]);

and we print out the value of the pointer followed by the value stored at that location

next we do the same thing, except first we set

ip = (int *) a;

the (int *) part is called a cast

a is an array and ip is an integer pointer

to make it compile, we have to ask the compiler to pretend that a is, in fact, also an integer pointer and thus allow the assignment to happen

and then we print out the pointer and the value at that location

if our hypothesis (that a is the address of the zeroth element of the array) is correct, these should print out the same thing:

$ gcc -o pointers-and-arrays pointers-and-arrays.c 
$ ./pointers-and-arrays 
ip = &(a[0]) = 0x7ffcf7a53e90
*ip = 10
ip = a = 0x7ffcf7a53e90
*ip = 10

and they do!

hypothesis confirmed: an array is identified by the address of its zeroth item

if this is true, we might be able to access other items in the array using pointer arithmetic

that is, taking the address of the zeroth item and adding to it

as an experiment, uncomment this line in the preceding program:

printf("ip + 1 = %p\n", ip + 1);

here we add 1 to the pointer that contains the address of the beginning of the array

and here’s the output:

$ gcc -o pointers-and-arrays pointers-and-arrays.c 
$ ./pointers-and-arrays 
ip = &(a[0]) = 0x7fffe7a3c5d0
*ip = 10
ip = a = 0x7fffe7a3c5d0
*ip = 10
ip + 1 = 0x7fffe7a3c5d4

what’s going on here? we added 1 to the pointer and it added FOUR to the address!

the compiler is being smart

it knows that ip points to an integer

and it knows that pointers are frequently used to point to beginnings of arrays

so it assumes ip + 1 should be the address of the next item

and, since integers are 4 bytes, it adds 4 to the address instead of 1

uncomment the for-loop at the end to see more evidence

also change the array to contain chars instead of ints and see how things change

the compiler will magically increment by 1 instead of 4 because chars are only a single byte

here’s a program that further demonstrates the equivalence of arrays and pointers: more-pointers-and-arrays.c

nothing surprising, except that we can actually do pointer arithmetic on the array identifier itself (ie, char_array+i)

but using pointer arithmetic to equate the behavior of arrays begs the question: how does this interact with regions of memory returned by malloc?

array-with-malloc.c

we can do that too!

here we ask malloc for 16 bytes, which is room for 4 integers

and we can assign to them as in the program above

the comments in the right column show the (more or less) equivalent array operations

typically, though, when using malloc to allocate memory for an array and each element of the array is larger than one byte in size, you let the compiler do the arithmetic for you

that is, you don’t think to yourself "I want an array to store four integers, each integer needs four bytes, four times four is 16, so I need to call malloc with the parameter 16"

instead, you use sizeof (which takes as its parameter a type or a variable and returns the number of bytes that type or variable requires) and multiply it by the number of items

like so: array-with-malloc-and-sizeof.c

and now we’ve finally seen enough to explain the heretofore mysterious parameters to the main function:

int main(int argc, char *argv[])

they contain the command-line arguments: that is, the arguments passed to the program on the command-line (equivalent to sys.argv in Python)

argc is a simple integer: the number of command-line arguments

but what is argv?

two possibilities:

it’s a pointer to a character array
or it’s an array of character pointers

on further thought, the former doesn’t pass muster: a char * is itself a pointer to a character array, so it doesn’t make sense for this to mean the same thing

therefore this is, indeed an array of character pointers

|                   |   ...
+-------------------+
| 0x7fffff125518150 |   address of argv[3]
+-------------------+
| 0x7fffff125518148 |   address of argv[2]
+-------------------+
| 0x7fffff125518140 |   address of argv[1]
+-------------------+
| 0x7fffff125518134 |   address of argv[0]
+-------------------+

so if we go to address 0x7fffff125518134, we’ll find the first character of the zeroth argument

example: command-line-args.c

I’m actually doing a bad thing here: the final printf is printing the item after the end of the array, just to see what’s there

$ gcc -o command-line-args command-line-args.c 
$ ./command-line-args foo bar baz
argv[0] is "./command-line-args"
argv[1] is "foo"
argv[2] is "bar"
argv[3] is "baz"
argv[argc] is (nil)

fairly straightforward, with a couple things worth mention

first we note that argv[0] is the name of the program itself

then we note that the value of the item after the end of the array of pointers is (nil), which is a pointer that points to address 0x0

now, because an array is really just a pointer to the zeroth element…

and this is an array containing pointers to characters…

we can create a variable that is a pointer to a pointer to characters

and set it equal to the array itself

and then use pointer arithmetic to traverse the array

uncomment the for-loop at the bottom to see this in action

(it’s not important that you deeply grok how this works, but you should know that sometimes people use dirty tricks like this to traverse arrays)

(and that it depends on the fact that the special value NULL follows the valid entries in the array)