CS 202 Lecture 27 – pointers II

pete > courses > CS 202 Spring 24 > Lecture 27: pointers II

Lecture 27: pointers II

Goals

use pointers with structs
use the -> operator to make accessing struct fields cleaner
describe how strings are represented in C
embed arbitrary binary data in a C string

I mentioned last time that, in practice, structs and arrays are rarely passed as parameters

and instead pointers to structs and arrays are preferred

let’s first look at a program where the structs are passed by value: struct-param.c

this is the same struct to represent 2D points we saw a few lectures ago

this time, main creates two points, a and b, and initializes them to (0,0) and (4,3)

then it calls dist_squared to calculate the distance between them, squared (because introducing square root would require floating point operations and it’s unnecessarily complicating right now)

were we to look at the assembly, we would see both fields of a and both fields of b pushed onto the stack

struct-param.s

in this example, that’s not terribly onerous

but if they were huge structs, it would be terribly onerous

especially for an operation like this where we’re just going to read a few fields and do some math

as mentioned last time, it’s more efficient to pass the address of the struct instead of the struct itself

we achieve this in the predictable way: struct-param-pointer.c

(for the time being, ignore the dist_squared_better function at the bottom)

instead of passing a and b to the function, we pass &a and &b (line 24)

instead of accepting parameters of type struct point, the function accepts parameters of type struct point * (lines 10 and 27)

and instead of accessing p1.x, p2.x, p1.y, and p2.y, it needs to use (*p1).x, (*p2).x, (*p1).y, and (*p1).y

this is a straightforward transformation akin to the one we performed last time on pointer.c

now the only things passed as parameters (and thus consuming space on the stack) are the addresses of a and b

way more efficient, both in terms of time (copying values to the callee’s stack frame) and memory (consumed by the callee’s stack frame)

this may sound like a modest win, but it matters quite a bit when we’re talking super efficiency-sensitive code like the stuff that exists deep inside the operating system (which is usually written in C)

but this notation can get unwieldy

assume a is a pointer to a struct that has a field b; we’d access it using (*a).b

and if b is itself a pointer to a struct that has a field c, we access it using (*(*a).b).c

and if c is in turn a pointer to a struct that has a field d, we’d use (*(*(*a).b).c).d

I know this seems pathological and unrealistic, but it really isn’t

complex software often has really complex, deeply-nested structures like this

fortunately, there’s syntactic sugar to make this easier to both read and write

(recall that syntactic sugar is an alternate way of writing some piece of code that is… easier to both read and write—easier to swallow, as it were)

in short, this

(*x).y

is equivalent to this

x->y

and so this

(*(*(*a).b).c).d

is equivalent to this

a->b->c->d

much cleaner

change of direction!

we’ve talked about lots of integer data types in C and we’ve talked (a bit) about non-integral data types (ie, float)

but we haven’t talked about strings!

first recall the lecture weeks ago in which we talked about using zeroes and ones to represent more complex data

I presented ASCII as one way to represent characters, in which (more or less arbitrary) 8-bit bit sequences were assigned to a set of printable characters

so A is represented by 0x41, a is represented by 0x61, etc

what, then, is a string?

a sequence of 8-bit values, where each 8-bit value represents a character

how, then can we identify strings?

by the address of the first character!

example: string.c

here we declare the variable s to be a pointer to a character

specifically, it will be the address of the character "h" in memory

"e" will be in the following memory slot

then "l", then another "l", then a space, then "w", etc.

when we wish to print a string, we use the "%s" format specifier, as demosntrated in the last line of the program

we’ve previously seen the "\n" thing, where this represents the single newline characer

but the """ thing is new

this is so we can embed a literal double-quote character inside our string

note that if we just put a double-quote there, the compiler would assume the string itself was done

we have to backslash-escape it so that the compiler knows to not interpret it as the end of the string, and instead to put a literal double-quote character there

as I said, a string is just a sequence of ASCII characters in memory

which presents an interesting problem

when we’re printing the string, how does the program know where the string ends?!

the answer is that there’s a special, non-printable ASCII "character" set aside to indicate the end of the string

it’s the "null byte", which is represented by all zeroes (ie, 0x00)

so all strings in C end with a null byte

and thus printf will print characters until it finds a null byte

if you think back to the ASCII table, you will recall that it includes a bunch of "non-printable" characters

so you might be asking yourself how you can put those in a string

wonder no more: binary-string.c

recall the backslash-escape trick

this is similar: when the compiler sees a backslash followed by an 'x', it expects the following two characters to be the hex representation of the byte to insert into the string

note that the last character of the (binary) string is 0x20, which is the ASCII code for the newline character

which is why the second double-quote shows up on the next line

this program claims to do a bad thing: segfault.c

what is that bad thing?

first, we declare s to be a pointer to an integer

then we print out an informative message

and finally we put the character 'q' into memory where s points

so why doesn’t this work?

because we don’t give s an initial value

and therefore when we try to use it as a pointer, we can’t

next time: more breaking of things