pete > courses > CS 202 Spring 24 > Lecture 28: pointers III
Lecture 28: pointers III
Goals
- describe what causes a segmentation fault
- use gdb to step through programs, examine memory, and registers
- define big-endian and little-endian
as we saw at the end of last lecture, this program claims to do a bad thing: segfault.c
what is that bad thing?
first, we declare ip to be a pointer to an integer
then we print out an informative message
and finally we put the number 12 into memory where ip points
so why doesn’t this work?
because we don’t give ip an initial value
and therefore when we try to use it as a pointer, we can’t
here’s what it looks like in action:
$ gcc -o segfault segfault.c $ ./segfault Segmentation fault (core dumped)
a segmentation fault or segmentation violation ("segfault" for short) occurs when a program attempts to access memory to which it does not have privileges
in this case, ip is a variable that contains an address
we attempted to write to memory at that address
and we did not have permission, so the operating system aborted the program for us
you might wonder what memory a program does have permission to write to and read from
for our purposes in this class, it’s the instructions, the stack, the heap, and other data like string literals
which invites an interesting question: is this what happens when the stack grows too large? why yes! yes it is!
you might also notice something curious in the output
namely, that the (second) string we requested to be printed with printf does not appear!
this is because the underlying machinery often saves up output generated by individual calls to printf and then shoots it off to the screen all at once
so our printf call resulted in the underlying machinery saving the text to print
but it never got around to actually showing the output before the segfault happened and the operating system terminated the program
(the reason the first call to printf appears to take effect is because it prints "enough": in this case, it includes a newline character, which is what the internals of printf are configured to look for to determine if "enough" data has been printed. there are cases where printf could be configured differently, though, so just adding a newline will not always "correct" this behavior)
this behavior could make debugging a program difficult
my guess is that many of you (including myself) often insert print statements to help debug
but if the stuff you’re printing never actually makes it to the screen, it’s going to be really difficult to learn anything useful from them
fortunately, there’s a piece of software called a debugger that will allow us to run the program in a special environment in which we can examine the state of the program
the particular program we’re going to use is called gdb
(much of today will involve gdb; there are tons of commands, I’ll introduce many today, but all the ones you need are described in this guide I wrote)
first off, when you want to run your program in gdb, you need to compile it with the -g flag to include certain information for the debugger to do its job:
$ gcc -g -o segfault segfault.c
then you just feed the compiled program to gdb:
$ gdb segfault
it’ll print out a lot of stuff and then give you its own prompt, meaning it’s ready for a gdb-specific command
(gdb)
the first command we’ll use is run, which… runs the program
(gdb) run Starting program: /home/pete/tmp/cs202-lecture-26/segfault Program received signal SIGSEGV, Segmentation fault. main (argc=1, argv=0x7fffffffe5e8) at segfault.c:15 15 *ip = 12;
this output tells us that the program crashed with a segmentation fault
it even tells us which line of C code where it crashed
but it doesn’t tell us precisely what about this line of code caused the error
we’ll have to do some more investigation
since it’s a pointer operation, we might want to check out the value of ip when it crashed
we can do that with gdb’s print command:
(gdb) print ip $1 = (int *) 0x1000
which tells us that the value of ip is 0x1000 (and also that it has type “int *“)
we can also ask gdb to to use ip as a pointer and tell us what’s stored at that address:
(gdb) print *ip Cannot access memory at address 0x1000
and now we see why it crashed: memory address 0x1000 is inaccessible, which is what caused the segfault
next program: stack-frame-games.c
what is going to be printed?
this is sort of the same issue: in bar, a is not being given an initial value
so the output depends entirely on what’s stored in whatever chunk of memory the compiler sets aside for a
but remember that a is one of bar’s local variables, which means it’ll be in bar’s stack frame
with that in mind, let’s mentally walk through the program
main starts, it has its own stack frame
then it calls foo, which causes a stack frame to be pushed
there will be space in this frame for foo’s local variable z, which will be assigned the value 93 (ie, the value 93 will be put into memory at the location the compiler has set aside to store foo’s local variable z)
then foo returns (the value returned doesn’t matter for our purposes today) and its stack frame is no longer relevant—but the contents of that memory remains intact
then main calls bar, which causes a stack frame to be pushed into the same chunk of memory where foo’s stack frame used to be
unsurprisingly, because the compiler uses relatively simple, deterministic rules for this sort of thing, the space it picks to store a in bar is the same as the space it chose to store z in foo
so even though we never explicitly gave it a value, a is going to be 93
$ gcc -o stack-frame-games stack-frame-games.c $ ./stack-frame-games a is 93
let’s look at this happen in gdb, which will conveniently introduce several other commands
first compile it with the debugging flag and then feed it to gdb
$ gcc -g -o stack-frame-games stack-frame-games.c $ gdb stack-frame-games ... lots of output ... (gdb)
first thing: stop execution at the beginning of foo so we can examine memory
use the break command to cause gdb to stop there:
(gdb) break foo Breakpoint 1 at 0x400527: file stack-frame-games.c, line 18.
(it even tells us the line of C code where it’s going to break)
then we ask gdb to run it:
(gdb) run Starting program: /home/pete/tmp/cs202-lecture-26/stack-frame-games Breakpoint 1, foo (x=12) at stack-frame-games.c:18 18 int z = 93;
the output tells us that it’s about to run line 18, and it conveniently tells us the code at line 18
now we want to examine the memory where z is stored
we could use print z, but let’s go another direction instead: let’s examine the raw memory
first we need the address
and to get that, it would help to see the assembly
which we can do with the disas command
(gdb) disas Dump of assembler code for function foo: 0x0000000000400520 <+0>: push rbp 0x0000000000400521 <+1>: mov rbp,rsp 0x0000000000400524 <+4>: mov DWORD PTR [rbp-0x14],edi => 0x0000000000400527 <+7>: mov DWORD PTR [rbp-0x4],0x5d 0x000000000040052e <+14>: mov eax,DWORD PTR [rbp-0x4] 0x0000000000400531 <+17>: imul eax,DWORD PTR [rbp-0x14] 0x0000000000400535 <+21>: pop rbp 0x0000000000400536 <+22>: ret End of assembler dump.
the instruction with the "=>" symbol is the one about to be run
we can infer that it’s storing the value 93 (ie, 0x5d) at address rbp-0x4
from which we can conclude that the variable z is stored at rpb-0x4
as you might have deduced from the most recent assignment, x86 uses rbp to store the frame pointer
we can see the value of all the registers using the info reg command:
(gdb) info reg rax 0x4004f6 4195574 rbx 0x0 0 rcx 0x0 0 rdx 0x7fffffffe858 140737488349272 rsi 0x7fffffffe848 140737488349256 rdi 0xc 12 rbp 0x7fffffffe740 0x7fffffffe740 rsp 0x7fffffffe740 0x7fffffffe740 r8 0x4005d0 4195792 r9 0x7ffff7de9900 140737351948544 r10 0x4 4 r11 0x1 1 r12 0x400400 4195328 r13 0x7fffffffe840 140737488349248 r14 0x0 0 r15 0x0 0 rip 0x400527 0x400527 <foo+7> eflags 0x206 [ PF IF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0
if we only want to see a single register, though, we can also use print:
(gdb) print $rbp $1 = (void *) 0x7fffffffe740
(note that here we have to precede the name of the register with a "$")
now we can use the x command to eXamine memory
we could manually subtract 4 from 0x7fffffffe740 to calculate the address we want
or we could ask gdb to do the math for us:
(gdb) x $rbp-4 0x7fffffffe73c: 0x00000000
which tells us that the 32-bit value zero is stored at address 0x7fffffffe73c
we expect that if we execute the next instruction (remember, it’s the one that stores 93 at this memory location), we’ll see something different there
the stepi command causes gdb to execute a single instruction
(gdb) stepi 20 return z * x; (gdb) x $rbp-4 0x7fffffffe73c: 0x0000005d
and we see that the memory location now contains 93
so were we to resume execution and break in bar, we should expect to examine the memory where a is stored and still find 93
insert a new breakpoint at the beginning of bar:
(gdb) break bar Breakpoint 2 at 0x400542: file stack-frame-games.c, line 27.
and then resume execution:
(gdb) continue Continuing. Breakpoint 2, bar (y=19) at stack-frame-games.c:27 27 printf("a is %d\n", a);
(note that we cannot use run here: that would cause the program to start over from the beginning)
disassemble to find where a is stored:
(gdb) disas Dump of assembler code for function bar: 0x0000000000400537 <+0>: push rbp 0x0000000000400538 <+1>: mov rbp,rsp 0x000000000040053b <+4>: sub rsp,0x20 0x000000000040053f <+8>: mov DWORD PTR [rbp-0x14],edi => 0x0000000000400542 <+11>: mov eax,DWORD PTR [rbp-0x4] 0x0000000000400545 <+14>: mov esi,eax 0x0000000000400547 <+16>: mov edi,0x4005e4 0x000000000040054c <+21>: mov eax,0x0 0x0000000000400551 <+26>: call 0x4003f0 <printf@plt> 0x0000000000400556 <+31>: mov eax,DWORD PTR [rbp-0x14] 0x0000000000400559 <+34>: imul eax,DWORD PTR [rbp-0x4] 0x000000000040055d <+38>: leave 0x000000000040055e <+39>: ret End of assembler dump.
and examine that memory:
(gdb) x $rbp-4 0x7fffffffe73c: 0x0000005d
bingo.
new question: the most recent gdb output indicates that the 32-bit (ie, 4-byte) value 0x0000005d is stored at address 0x7fffffffe73c
but recall we’re working with byte-addressable memory, meaning that each address is associated with only one byte of data
so it actually takes up addresses 0x7fffffffe73c, 3d, 3e, and 3f
which of the four bytes of 0x0000005d are stored in which location?
there seem to be two options:
0x7fffffffe73c: 5d 0x7fffffffe73d: 00 0x7fffffffe73e: 00 0x7fffffffe73f: 00
and
0x7fffffffe73c: 00 0x7fffffffe73d: 00 0x7fffffffe73e: 00 0x7fffffffe73f: 5d
and the "answer" is that both schemes are used
the choice is up to the hardware designer (that is, it’s specified in the ISA)
the first scheme, in which the least-significant byte is stored at the lowest-numbered address, is called little-endian
the latter scheme, in which the least-significant byte is stored at the highest-numbered address, is called big-endian
x86 is a little-endian architecture
the Motorola chips Apple used to use, as well as the chips IBM makes for its high-powered machines, are big-endian
we can use gdb to corroborate this
by asking x to print out single bytes:
(gdb) x/b $rbp-4 0x7fffffffe73c: 0x5d (gdb) x/b $rbp-3 0x7fffffffe73d: 0x00 (gdb) x/b $rbp-2 0x7fffffffe73e: 0x00 (gdb) x/b $rbp-1 0x7fffffffe73f: 0x00