Lecture 31 - More debugging and strings

Published

May 3, 2023

Goals

  • Learn how to use the debugger to solve problems
  • See an example of using gdb for forensic analysis of a program that we don’t have code for

broken float

Reminder: I have a program that prints out all of the floats we can get with an eight bit number. When we run it, the output looks a little strange. The negative numbers are all really big compared to the positive numbers. They should be symmetric.

If we look at the functions, there is main and there is a function called floatValue. Let’s start there. We can list the entirety of the function with list floatValue, main

The problem is the negative numbers, so let’s set a breakpoint at the point where the number becomes negative.

b 21 if sign == 1

When we run the code, it will zip through all of the positive numbers and the special ones

Let’s take a look at the local variables

(gdb) info locals
sign = 1
exponent = 8
mantissa = 0
result = 0

Anyone see a problem?

How can the exponent be 8? it is only 3 bits!

(gdb) list
16        int sign = (f >> 7) & 1;
17        int exponent = (f >> 4);
18        int mantissa = f & 0xF;
19        float result;
20
21        if (exponent == 0){
22          result = mantissa / 16.0f;
23          
24          result = result * pow(2, exponent - 3);
25
(gdb) p /x f
$2 = 0x80
(gdb) p f >> 4
$3 = 8

why are we getting an 8? We aren’t masking out the three bits we want, so we are getting the sign bit as well.

Debugging assembly

What if you are working with code that doesn’t have debugging data included? gdb works there as well

We will want to tell it to use Intel assembly however in ~/.config/gdb/gdbinit

set disassembly-flavor intel

More commands

  • stepi and nexti - just like step and next, they just advance by machine instruction (short forms si and ni)
  • disassemble function name
    • print out the assembly for the function
  • info reg
    • can specify specific registers by putting a $ in front of them (e.g., $rax)
    • we can look at the condition codes with info reg eflags
  • break * - break works the same way, but we need to put an * in front of the name or address to specify an instruction instead of a line number

general strategy

if we know function names we can set breakpoints in the usual way there is always a main, so you can start there we can disassemble main to find the function calls

example

I have a mystery program we are trying to figure out If I run it, I get this:

$ ./mystery
Usage: ./mystery <number>
$ ./mystery 42
3

Curious.

So, let’s poke around in gdb to see if we can’t figure out what the program does

Start by disassembling main

(gdb) disassemble main
Dump of assembler code for function main:
   0x000000000040116b <+0>: push   rbp
   0x000000000040116c <+1>: mov    rbp,rsp
   0x000000000040116f <+4>: sub    rsp,0x20
   0x0000000000401173 <+8>: mov    DWORD PTR [rbp-0x14],edi
   0x0000000000401176 <+11>:    mov    QWORD PTR [rbp-0x20],rsi
   0x000000000040117a <+15>:    cmp    DWORD PTR [rbp-0x14],0x2
   0x000000000040117e <+19>:    je     0x4011a3 <main+56>
   0x0000000000401180 <+21>:    mov    rax,QWORD PTR [rbp-0x20]
   0x0000000000401184 <+25>:    mov    rax,QWORD PTR [rax]
   0x0000000000401187 <+28>:    mov    rsi,rax
   0x000000000040118a <+31>:    mov    edi,0x402010
   0x000000000040118f <+36>:    mov    eax,0x0
   0x0000000000401194 <+41>:    call   0x401030 <printf@plt>
   0x0000000000401199 <+46>:    mov    edi,0xffffffff
   0x000000000040119e <+51>:    call   0x401050 <exit@plt>
   0x00000000004011a3 <+56>:    mov    rax,QWORD PTR [rbp-0x20]
   0x00000000004011a7 <+60>:    add    rax,0x8
   0x00000000004011ab <+64>:    mov    rax,QWORD PTR [rax]
   0x00000000004011ae <+67>:    mov    rdi,rax
   0x00000000004011b1 <+70>:    call   0x401040 <atoi@plt>
   0x00000000004011b6 <+75>:    mov    DWORD PTR [rbp-0x4],eax
   0x00000000004011b9 <+78>:    mov    eax,DWORD PTR [rbp-0x4]
   0x00000000004011bc <+81>:    mov    edi,eax
   0x00000000004011be <+83>:    call   0x401146 <calculate>
   0x00000000004011c3 <+88>:    mov    DWORD PTR [rbp-0x8],eax
   0x00000000004011c6 <+91>:    mov    eax,DWORD PTR [rbp-0x8]
   0x00000000004011c9 <+94>:    mov    esi,eax
   0x00000000004011cb <+96>:    mov    edi,0x402024
   0x00000000004011d0 <+101>:   mov    eax,0x0
   0x00000000004011d5 <+106>:   call   0x401030 <printf@plt>
   0x00000000004011da <+111>:   mov    eax,0x0
   0x00000000004011df <+116>:   leave
   0x00000000004011e0 <+117>:   ret
End of assembler dump.

Start by looking at the call instructions.

There are five function calls - 2x printf - so printing things out - exit - we haven’t used this yet, but it does what it says – exits the program - atoi - we haven’t used this one either, but it is another standard library function (it does the same thing as strtol) - calculate - that looks like an actual function in the code

We can also see a conditional on +19. If we trace that back a little we can see that it is checking if argc is 2. The printf and the exit is probably printing the usage message and exiting.

We can check this. Right before the call to printf, we can see it loading the argument registers. One of them appears to be getting the address of the first string in the argv.

let’s take a look at the second one. One of the formats we can use is s for C strings

(gdb) x /s 0x402010
0x402010:       "Usage: %s <number>\n"

Confirmed, we are printing the usage message

So, jumping down under that to main+56 we can see it unpacking argv again. It is adding 0x8, so we are probably looking at the second argument now. Then it calls atoi, so it is parsing the number string into a number.

It looks like calculate only takes a single argument

Set a breakpoint right before the call and then start the program

(gdb) b *main+83
Breakpoint 1 at 0x4011be
(gdb) r 42
Starting program: /home/candrews/cs202/s23/debugging/mystery 42
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Breakpoint 1, 0x00000000004011be in main ()

Check out the argument

(gdb) p $edi
$1 = 42

Okay, so that is confirmed, we are going to call calculate(42)

Let’s set a new breakpoint and jump to calculate (yes, we could just step there)

(gdb) b calculate
Breakpoint 2 at 0x40114a
(gdb) c
Continuing.

Breakpoint 2, 0x000000000040114a in calculate ()

Let’s take a look around:

(gdb) disassemble
Dump of assembler code for function calculate:
   0x0000000000401146 <+0>: push   rbp
   0x0000000000401147 <+1>: mov    rbp,rsp
=> 0x000000000040114a <+4>: mov    DWORD PTR [rbp-0x14],edi
   0x000000000040114d <+7>: mov    DWORD PTR [rbp-0x4],0x0
   0x0000000000401154 <+14>:    mov    eax,DWORD PTR [rbp-0x14]
   0x0000000000401157 <+17>:    and    eax,0x1
   0x000000000040115a <+20>:    add    DWORD PTR [rbp-0x4],eax
   0x000000000040115d <+23>:    sar    DWORD PTR [rbp-0x14],1
   0x0000000000401160 <+26>:    cmp    DWORD PTR [rbp-0x14],0x0
   0x0000000000401164 <+30>:    jne    0x401154 <calculate+14>
   0x0000000000401166 <+32>:    mov    eax,DWORD PTR [rbp-0x4]
   0x0000000000401169 <+35>:    pop    rbp
   0x000000000040116a <+36>:    ret
End of assembler dump.

Notice that we have skipped the stack frame setup steps

We could just try to walk through this and handle the values in our head, but let’s use the debugger

First we will set up some values to watch si

(gdb) display $eax
1: $eax = 42
(gdb) display /wd $rbp-0x14
2: x/dw $rbp-0x14  0x7fffffffd74c:      0
(gdb) display /wd $rbp-0x4
3: x/dw $rbp-0x4  0x7fffffffd75c:       32767
(gdb) display /i $pc
4: x/i $pc
=> 0x40114a <calculate+4>:  mov    DWORD PTR [rbp-0x14],edi

Notice how we turned the memory accesses into expressions. You should also notice that the current values in memory are garbage…

The last display is showing the contents of the PC as an instruction so we can see what the next instruction will be

(gdb) ni
0x000000000040114d in calculate ()
1: $eax = 42
5: x/dw $rbp-0x14  0x7fffffffddfc:  42
6: x/dw $rbp-0x4  0x7fffffffde0c:   32767
7: x/i $pc
=> 0x40114d <calculate+7>:  mov    DWORD PTR [rbp-0x4],0x0
(gdb) ni
0x0000000000401154 in calculate ()
1: $eax = 42
5: x/dw $rbp-0x14  0x7fffffffddfc:  42
6: x/dw $rbp-0x4  0x7fffffffde0c:   0
7: x/i $pc
=> 0x401154 <calculate+14>: mov    eax,DWORD PTR [rbp-0x14]
(gdb) ni
0x0000000000401157 in calculate ()
1: $eax = 42
5: x/dw $rbp-0x14  0x7fffffffddfc:  42
6: x/dw $rbp-0x4  0x7fffffffde0c:   0
7: x/i $pc
=> 0x401157 <calculate+17>: and    eax,0x1
(gdb)
0x000000000040115a in calculate ()
1: $eax = 0
5: x/dw $rbp-0x14  0x7fffffffddfc:  42
6: x/dw $rbp-0x4  0x7fffffffde0c:   0
7: x/i $pc
=> 0x40115a <calculate+20>: add    DWORD PTR [rbp-0x4],eax
(gdb)
0x000000000040115d in calculate ()
1: $eax = 0
5: x/dw $rbp-0x14  0x7fffffffddfc:  42
6: x/dw $rbp-0x4  0x7fffffffde0c:   0
7: x/i $pc
=> 0x40115d <calculate+23>: sar    DWORD PTR [rbp-0x14],1
(gdb)
0x0000000000401160 in calculate ()
1: $eax = 0
5: x/dw $rbp-0x14  0x7fffffffddfc:  21
6: x/dw $rbp-0x4  0x7fffffffde0c:   0
7: x/i $pc
=> 0x401160 <calculate+26>: cmp    DWORD PTR [rbp-0x14],0x0

It looks like we are in a loop - AND the value with 1 - ADD the result to a second variable - shift the value right 1

So this is… counting the 1s in the binary representation of the number!

(gdb) p /x 42
$1 = 0x2a

42 is 0x2a or 0010 1010, thus the 3

Mechanical level

vocabulary

Skills