CS 202 Lecture 28 – pointers III

pete > courses > CS 202 Spring 24 > Lecture 28: pointers III

Lecture 28: pointers III

Goals

describe what causes a segmentation fault
use gdb to step through programs, examine memory, and registers
define big-endian and little-endian

as we saw at the end of last lecture, this program claims to do a bad thing: segfault.c

what is that bad thing?

first, we declare ip to be a pointer to an integer

then we print out an informative message

and finally we put the number 12 into memory where ip points

so why doesn’t this work?

because we don’t give ip an initial value

and therefore when we try to use it as a pointer, we can’t

here’s what it looks like in action:

$ gcc -o segfault segfault.c 
$ ./segfault 
Segmentation fault (core dumped)

a segmentation fault or segmentation violation ("segfault" for short) occurs when a program attempts to access memory to which it does not have privileges

in this case, ip is a variable that contains an address

we attempted to write to memory at that address

and we did not have permission, so the operating system aborted the program for us

you might wonder what memory a program does have permission to write to and read from

for our purposes in this class, it’s the instructions, the stack, the heap, and other data like string literals

which invites an interesting question: is this what happens when the stack grows too large? why yes! yes it is!

you might also notice something curious in the output

namely, that the (second) string we requested to be printed with printf does not appear!

this is because the underlying machinery often saves up output generated by individual calls to printf and then shoots it off to the screen all at once

so our printf call resulted in the underlying machinery saving the text to print

but it never got around to actually showing the output before the segfault happened and the operating system terminated the program

(the reason the first call to printf appears to take effect is because it prints "enough": in this case, it includes a newline character, which is what the internals of printf are configured to look for to determine if "enough" data has been printed. there are cases where printf could be configured differently, though, so just adding a newline will not always "correct" this behavior)

this behavior could make debugging a program difficult

my guess is that many of you (including myself) often insert print statements to help debug

but if the stuff you’re printing never actually makes it to the screen, it’s going to be really difficult to learn anything useful from them

fortunately, there’s a piece of software called a debugger that will allow us to run the program in a special environment in which we can examine the state of the program

the particular program we’re going to use is called gdb

(much of today will involve gdb; there are tons of commands, I’ll introduce many today, but all the ones you need are described in this guide I wrote)

first off, when you want to run your program in gdb, you need to compile it with the -g flag to include certain information for the debugger to do its job:

$ gcc -g -o segfault segfault.c

then you just feed the compiled program to gdb:

$ gdb segfault

it’ll print out a lot of stuff and then give you its own prompt, meaning it’s ready for a gdb-specific command

(gdb)

the first command we’ll use is run, which… runs the program

(gdb) run
Starting program: /home/pete/tmp/cs202-lecture-26/segfault 

Program received signal SIGSEGV, Segmentation fault.
main (argc=1, argv=0x7fffffffe5e8) at segfault.c:15
15      *ip = 12;

this output tells us that the program crashed with a segmentation fault

it even tells us which line of C code where it crashed

but it doesn’t tell us precisely what about this line of code caused the error

we’ll have to do some more investigation

since it’s a pointer operation, we might want to check out the value of ip when it crashed

we can do that with gdb’s print command:

(gdb) print ip
$1 = (int *) 0x1000

which tells us that the value of ip is 0x1000 (and also that it has type “int *“)

we can also ask gdb to to use ip as a pointer and tell us what’s stored at that address:

(gdb) print *ip
Cannot access memory at address 0x1000

and now we see why it crashed: memory address 0x1000 is inaccessible, which is what caused the segfault

next program: stack-frame-games.c

what is going to be printed?

this is sort of the same issue: in bar, a is not being given an initial value

so the output depends entirely on what’s stored in whatever chunk of memory the compiler sets aside for a

but remember that a is one of bar’s local variables, which means it’ll be in bar’s stack frame

with that in mind, let’s mentally walk through the program

main starts, it has its own stack frame

then it calls foo, which causes a stack frame to be pushed

there will be space in this frame for foo’s local variable z, which will be assigned the value 93 (ie, the value 93 will be put into memory at the location the compiler has set aside to store foo’s local variable z)

then foo returns (the value returned doesn’t matter for our purposes today) and its stack frame is no longer relevant—but the contents of that memory remains intact

then main calls bar, which causes a stack frame to be pushed into the same chunk of memory where foo’s stack frame used to be

unsurprisingly, because the compiler uses relatively simple, deterministic rules for this sort of thing, the space it picks to store a in bar is the same as the space it chose to store z in foo

so even though we never explicitly gave it a value, a is going to be 93

$ gcc -o stack-frame-games stack-frame-games.c 
$ ./stack-frame-games 
a is 93

let’s look at this happen in gdb, which will conveniently introduce several other commands

first compile it with the debugging flag and then feed it to gdb

$ gcc -g -o stack-frame-games stack-frame-games.c 
$ gdb stack-frame-games
... lots of output ...
(gdb)

first thing: stop execution at the beginning of foo so we can examine memory

use the break command to cause gdb to stop there:

(gdb) break foo
Breakpoint 1 at 0x400527: file stack-frame-games.c, line 18.

(it even tells us the line of C code where it’s going to break)

then we ask gdb to run it:

(gdb) run
Starting program: /home/pete/tmp/cs202-lecture-26/stack-frame-games 

Breakpoint 1, foo (x=12) at stack-frame-games.c:18
18      int z = 93;

the output tells us that it’s about to run line 18, and it conveniently tells us the code at line 18

now we want to examine the memory where z is stored

we could use print z, but let’s go another direction instead: let’s examine the raw memory

first we need the address

and to get that, it would help to see the assembly

which we can do with the disas command

(gdb) disas
Dump of assembler code for function foo:
   0x0000000000400520 <+0>:     push   rbp
   0x0000000000400521 <+1>:     mov    rbp,rsp
   0x0000000000400524 <+4>:     mov    DWORD PTR [rbp-0x14],edi
=> 0x0000000000400527 <+7>:     mov    DWORD PTR [rbp-0x4],0x5d
   0x000000000040052e <+14>:    mov    eax,DWORD PTR [rbp-0x4]
   0x0000000000400531 <+17>:    imul   eax,DWORD PTR [rbp-0x14]
   0x0000000000400535 <+21>:    pop    rbp
   0x0000000000400536 <+22>:    ret    
End of assembler dump.

the instruction with the "=>" symbol is the one about to be run

we can infer that it’s storing the value 93 (ie, 0x5d) at address rbp-0x4

from which we can conclude that the variable z is stored at rpb-0x4

as you might have deduced from the most recent assignment, x86 uses rbp to store the frame pointer

we can see the value of all the registers using the info reg command:

(gdb) info reg
rax            0x4004f6 4195574
rbx            0x0  0
rcx            0x0  0
rdx            0x7fffffffe858   140737488349272
rsi            0x7fffffffe848   140737488349256
rdi            0xc  12
rbp            0x7fffffffe740   0x7fffffffe740
rsp            0x7fffffffe740   0x7fffffffe740
r8             0x4005d0 4195792
r9             0x7ffff7de9900   140737351948544
r10            0x4  4
r11            0x1  1
r12            0x400400 4195328
r13            0x7fffffffe840   140737488349248
r14            0x0  0
r15            0x0  0
rip            0x400527 0x400527 <foo+7>
eflags         0x206    [ PF IF ]
cs             0x33 51
ss             0x2b 43
ds             0x0  0
es             0x0  0
fs             0x0  0
gs             0x0  0

if we only want to see a single register, though, we can also use print:

(gdb) print $rbp
$1 = (void *) 0x7fffffffe740

(note that here we have to precede the name of the register with a "$")

now we can use the x command to eXamine memory

we could manually subtract 4 from 0x7fffffffe740 to calculate the address we want

or we could ask gdb to do the math for us:

(gdb) x $rbp-4
0x7fffffffe73c: 0x00000000

which tells us that the 32-bit value zero is stored at address 0x7fffffffe73c

we expect that if we execute the next instruction (remember, it’s the one that stores 93 at this memory location), we’ll see something different there

the stepi command causes gdb to execute a single instruction

(gdb) stepi
20      return z * x;
(gdb) x $rbp-4
0x7fffffffe73c: 0x0000005d

and we see that the memory location now contains 93

so were we to resume execution and break in bar, we should expect to examine the memory where a is stored and still find 93

insert a new breakpoint at the beginning of bar:

(gdb) break bar
Breakpoint 2 at 0x400542: file stack-frame-games.c, line 27.

and then resume execution:

(gdb) continue
Continuing.

Breakpoint 2, bar (y=19) at stack-frame-games.c:27
27      printf("a is %d\n", a);

(note that we cannot use run here: that would cause the program to start over from the beginning)

disassemble to find where a is stored:

(gdb) disas
Dump of assembler code for function bar:
   0x0000000000400537 <+0>:     push   rbp
   0x0000000000400538 <+1>:     mov    rbp,rsp
   0x000000000040053b <+4>:     sub    rsp,0x20
   0x000000000040053f <+8>:     mov    DWORD PTR [rbp-0x14],edi
=> 0x0000000000400542 <+11>:    mov    eax,DWORD PTR [rbp-0x4]
   0x0000000000400545 <+14>:    mov    esi,eax
   0x0000000000400547 <+16>:    mov    edi,0x4005e4
   0x000000000040054c <+21>:    mov    eax,0x0
   0x0000000000400551 <+26>:    call   0x4003f0 <printf@plt>
   0x0000000000400556 <+31>:    mov    eax,DWORD PTR [rbp-0x14]
   0x0000000000400559 <+34>:    imul   eax,DWORD PTR [rbp-0x4]
   0x000000000040055d <+38>:    leave  
   0x000000000040055e <+39>:    ret    
End of assembler dump.

and examine that memory:

(gdb) x $rbp-4
0x7fffffffe73c: 0x0000005d

bingo.

new question: the most recent gdb output indicates that the 32-bit (ie, 4-byte) value 0x0000005d is stored at address 0x7fffffffe73c

but recall we’re working with byte-addressable memory, meaning that each address is associated with only one byte of data

so it actually takes up addresses 0x7fffffffe73c, 3d, 3e, and 3f

which of the four bytes of 0x0000005d are stored in which location?

there seem to be two options:

0x7fffffffe73c: 5d
0x7fffffffe73d: 00
0x7fffffffe73e: 00
0x7fffffffe73f: 00

and

0x7fffffffe73c: 00
0x7fffffffe73d: 00
0x7fffffffe73e: 00
0x7fffffffe73f: 5d

and the "answer" is that both schemes are used

the choice is up to the hardware designer (that is, it’s specified in the ISA)

the first scheme, in which the least-significant byte is stored at the lowest-numbered address, is called little-endian

the latter scheme, in which the least-significant byte is stored at the highest-numbered address, is called big-endian

x86 is a little-endian architecture

the Motorola chips Apple used to use, as well as the chips IBM makes for its high-powered machines, are big-endian

we can use gdb to corroborate this

by asking x to print out single bytes:

(gdb) x/b $rbp-4
0x7fffffffe73c: 0x5d
(gdb) x/b $rbp-3
0x7fffffffe73d: 0x00
(gdb) x/b $rbp-2
0x7fffffffe73e: 0x00
(gdb) x/b $rbp-1
0x7fffffffe73f: 0x00