Lecture 31 - More debugging and strings
Goals
- Learn how to use the debugger to solve problems
- See an example of using gdb for forensic analysis of a program that we don’t have code for
broken float
Reminder: I have a program that prints out all of the floats we can get with an eight bit number. When we run it, the output looks a little strange. The negative numbers are all really big compared to the positive numbers. They should be symmetric.
If we look at the functions, there is main and there is a function called floatValue. Let’s start there. We can list the entirety of the function with list floatValue, main
The problem is the negative numbers, so let’s set a breakpoint at the point where the number becomes negative.
b 21 if sign == 1
When we run the code, it will zip through all of the positive numbers and the special ones
Let’s take a look at the local variables
(gdb) info locals
sign = 1
exponent = 8
mantissa = 0
result = 0
Anyone see a problem?
How can the exponent be 8? it is only 3 bits!
(gdb) list
16 int sign = (f >> 7) & 1;
17 int exponent = (f >> 4);
18 int mantissa = f & 0xF;
19 float result;
20
21 if (exponent == 0){
22 result = mantissa / 16.0f;
23
24 result = result * pow(2, exponent - 3);
25
(gdb) p /x f
$2 = 0x80
(gdb) p f >> 4
$3 = 8
why are we getting an 8? We aren’t masking out the three bits we want, so we are getting the sign bit as well.
Debugging assembly
What if you are working with code that doesn’t have debugging data included? gdb works there as well
We will want to tell it to use Intel assembly however in ~/.config/gdb/gdbinit
set disassembly-flavor intel
More commands
stepiandnexti- just likestepandnext, they just advance by machine instruction (short formssiandni)disassemblefunction name- print out the assembly for the function
info reg- can specify specific registers by putting a
$in front of them (e.g.,$rax) - we can look at the condition codes with
info reg eflags
- can specify specific registers by putting a
break *-breakworks the same way, but we need to put an*in front of the name or address to specify an instruction instead of a line number
general strategy
if we know function names we can set breakpoints in the usual way there is always a main, so you can start there we can disassemble main to find the function calls
example
I have a mystery program we are trying to figure out If I run it, I get this:
$ ./mystery
Usage: ./mystery <number>
$ ./mystery 42
3
Curious.
So, let’s poke around in gdb to see if we can’t figure out what the program does
Start by disassembling main
(gdb) disassemble main
Dump of assembler code for function main:
0x000000000040116b <+0>: push rbp
0x000000000040116c <+1>: mov rbp,rsp
0x000000000040116f <+4>: sub rsp,0x20
0x0000000000401173 <+8>: mov DWORD PTR [rbp-0x14],edi
0x0000000000401176 <+11>: mov QWORD PTR [rbp-0x20],rsi
0x000000000040117a <+15>: cmp DWORD PTR [rbp-0x14],0x2
0x000000000040117e <+19>: je 0x4011a3 <main+56>
0x0000000000401180 <+21>: mov rax,QWORD PTR [rbp-0x20]
0x0000000000401184 <+25>: mov rax,QWORD PTR [rax]
0x0000000000401187 <+28>: mov rsi,rax
0x000000000040118a <+31>: mov edi,0x402010
0x000000000040118f <+36>: mov eax,0x0
0x0000000000401194 <+41>: call 0x401030 <printf@plt>
0x0000000000401199 <+46>: mov edi,0xffffffff
0x000000000040119e <+51>: call 0x401050 <exit@plt>
0x00000000004011a3 <+56>: mov rax,QWORD PTR [rbp-0x20]
0x00000000004011a7 <+60>: add rax,0x8
0x00000000004011ab <+64>: mov rax,QWORD PTR [rax]
0x00000000004011ae <+67>: mov rdi,rax
0x00000000004011b1 <+70>: call 0x401040 <atoi@plt>
0x00000000004011b6 <+75>: mov DWORD PTR [rbp-0x4],eax
0x00000000004011b9 <+78>: mov eax,DWORD PTR [rbp-0x4]
0x00000000004011bc <+81>: mov edi,eax
0x00000000004011be <+83>: call 0x401146 <calculate>
0x00000000004011c3 <+88>: mov DWORD PTR [rbp-0x8],eax
0x00000000004011c6 <+91>: mov eax,DWORD PTR [rbp-0x8]
0x00000000004011c9 <+94>: mov esi,eax
0x00000000004011cb <+96>: mov edi,0x402024
0x00000000004011d0 <+101>: mov eax,0x0
0x00000000004011d5 <+106>: call 0x401030 <printf@plt>
0x00000000004011da <+111>: mov eax,0x0
0x00000000004011df <+116>: leave
0x00000000004011e0 <+117>: ret
End of assembler dump.
Start by looking at the call instructions.
There are five function calls - 2x printf - so printing things out - exit - we haven’t used this yet, but it does what it says – exits the program - atoi - we haven’t used this one either, but it is another standard library function (it does the same thing as strtol) - calculate - that looks like an actual function in the code
We can also see a conditional on +19. If we trace that back a little we can see that it is checking if argc is 2. The printf and the exit is probably printing the usage message and exiting.
We can check this. Right before the call to printf, we can see it loading the argument registers. One of them appears to be getting the address of the first string in the argv.
let’s take a look at the second one. One of the formats we can use is s for C strings
(gdb) x /s 0x402010
0x402010: "Usage: %s <number>\n"
Confirmed, we are printing the usage message
So, jumping down under that to main+56 we can see it unpacking argv again. It is adding 0x8, so we are probably looking at the second argument now. Then it calls atoi, so it is parsing the number string into a number.
It looks like calculate only takes a single argument
Set a breakpoint right before the call and then start the program
(gdb) b *main+83
Breakpoint 1 at 0x4011be
(gdb) r 42
Starting program: /home/candrews/cs202/s23/debugging/mystery 42
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Breakpoint 1, 0x00000000004011be in main ()
Check out the argument
(gdb) p $edi
$1 = 42
Okay, so that is confirmed, we are going to call calculate(42)
Let’s set a new breakpoint and jump to calculate (yes, we could just step there)
(gdb) b calculate
Breakpoint 2 at 0x40114a
(gdb) c
Continuing.
Breakpoint 2, 0x000000000040114a in calculate ()
Let’s take a look around:
(gdb) disassemble
Dump of assembler code for function calculate:
0x0000000000401146 <+0>: push rbp
0x0000000000401147 <+1>: mov rbp,rsp
=> 0x000000000040114a <+4>: mov DWORD PTR [rbp-0x14],edi
0x000000000040114d <+7>: mov DWORD PTR [rbp-0x4],0x0
0x0000000000401154 <+14>: mov eax,DWORD PTR [rbp-0x14]
0x0000000000401157 <+17>: and eax,0x1
0x000000000040115a <+20>: add DWORD PTR [rbp-0x4],eax
0x000000000040115d <+23>: sar DWORD PTR [rbp-0x14],1
0x0000000000401160 <+26>: cmp DWORD PTR [rbp-0x14],0x0
0x0000000000401164 <+30>: jne 0x401154 <calculate+14>
0x0000000000401166 <+32>: mov eax,DWORD PTR [rbp-0x4]
0x0000000000401169 <+35>: pop rbp
0x000000000040116a <+36>: ret
End of assembler dump.
Notice that we have skipped the stack frame setup steps
We could just try to walk through this and handle the values in our head, but let’s use the debugger
First we will set up some values to watch si
(gdb) display $eax
1: $eax = 42
(gdb) display /wd $rbp-0x14
2: x/dw $rbp-0x14 0x7fffffffd74c: 0
(gdb) display /wd $rbp-0x4
3: x/dw $rbp-0x4 0x7fffffffd75c: 32767
(gdb) display /i $pc
4: x/i $pc
=> 0x40114a <calculate+4>: mov DWORD PTR [rbp-0x14],edi
Notice how we turned the memory accesses into expressions. You should also notice that the current values in memory are garbage…
The last display is showing the contents of the PC as an instruction so we can see what the next instruction will be
(gdb) ni
0x000000000040114d in calculate ()
1: $eax = 42
5: x/dw $rbp-0x14 0x7fffffffddfc: 42
6: x/dw $rbp-0x4 0x7fffffffde0c: 32767
7: x/i $pc
=> 0x40114d <calculate+7>: mov DWORD PTR [rbp-0x4],0x0
(gdb) ni
0x0000000000401154 in calculate ()
1: $eax = 42
5: x/dw $rbp-0x14 0x7fffffffddfc: 42
6: x/dw $rbp-0x4 0x7fffffffde0c: 0
7: x/i $pc
=> 0x401154 <calculate+14>: mov eax,DWORD PTR [rbp-0x14]
(gdb) ni
0x0000000000401157 in calculate ()
1: $eax = 42
5: x/dw $rbp-0x14 0x7fffffffddfc: 42
6: x/dw $rbp-0x4 0x7fffffffde0c: 0
7: x/i $pc
=> 0x401157 <calculate+17>: and eax,0x1
(gdb)
0x000000000040115a in calculate ()
1: $eax = 0
5: x/dw $rbp-0x14 0x7fffffffddfc: 42
6: x/dw $rbp-0x4 0x7fffffffde0c: 0
7: x/i $pc
=> 0x40115a <calculate+20>: add DWORD PTR [rbp-0x4],eax
(gdb)
0x000000000040115d in calculate ()
1: $eax = 0
5: x/dw $rbp-0x14 0x7fffffffddfc: 42
6: x/dw $rbp-0x4 0x7fffffffde0c: 0
7: x/i $pc
=> 0x40115d <calculate+23>: sar DWORD PTR [rbp-0x14],1
(gdb)
0x0000000000401160 in calculate ()
1: $eax = 0
5: x/dw $rbp-0x14 0x7fffffffddfc: 21
6: x/dw $rbp-0x4 0x7fffffffde0c: 0
7: x/i $pc
=> 0x401160 <calculate+26>: cmp DWORD PTR [rbp-0x14],0x0
It looks like we are in a loop - AND the value with 1 - ADD the result to a second variable - shift the value right 1
So this is… counting the 1s in the binary representation of the number!
(gdb) p /x 42
$1 = 0x2a
42 is 0x2a or 0010 1010, thus the 3
Mechanical level
vocabulary
Skills