Lecture 32 - Linking
Goals
- Learn some more about strings
- Take another look at the compiling process and learn about linking and libraries
Strings
Another thing that you will need as you explore your bombs is a little better understanding of strings.
As we discussed earlier, strings are arrays of characters and the last character of a valid string should be '\0' , the null character (which has the numerical value of zero)
The null character decouples the length of the string from the length of the array that contains it
There are a number of string functions in /usr/include/string.h. You can see the list with
$ man string
For example: strlen(). This, like most of the others, makes explicit use of the null character. It counts the number of characters until it reaches the null terminator.
Unfortunately, the null terminated string is one of the worst programming language decisions ever. If you are constructing strings, it is very easy to forget the terminator, or alternatively to forget that you need to allocate enough space to hold it and write the zero into whatever comes next in memory.
This has led to numerous bugs and exploits over the years. strlen is pretty innocent, but a function like strcpy, which copies the data from one string to another is not. There is no guard to make sure that there is enough space to copy the string in (already bad enough), but if the null terminator is missing, it will just go on copying data until it hits a 0.
There are newer functions like strncpy which allow the programmer to set a length to stop copying (and should thus always be used), but even that doesn’t guarantee that the resulting string has a terminator.
Where the string lives and if it can be edited depends on how you make it
char * str = "test"; // static, read only, lives in .rodata
char * str = malloc(); // dynamic, read-write, lives on the heap
char str[] = "test"; // static, read-write, stack#include <stdio.h>
int main(int argc, char *argv[])
{
const char *str1 = "test1";
char *const str2 = "test2";
char *str3 = "test3";
char str4[] = "test4";
char str5[6] = "test5";
printf("%s\n", str1);
printf("%s\n", str2);
printf("%s\n", str3);
printf("%s\n", str4);
printf("%s\n", str5);
printf("\n");
// edit the strings
// str1[0] = 'X'; // assignment to read-only location
// str2[0] = 'X'; // seg fault
// str3[0] = 'X'; // seg fault
str4[0] = 'X';
str5[0] = 'X';
printf("%s\n", str1);
printf("%s\n", str2);
printf("%s\n", str3);
printf("%s\n", str4);
printf("%s\n", str5);
printf("\n");
// reassign the variables
str1 = "dalek1";
// str2 = "dalek2"; // read only variable
str3 = "dalek3";
// str4 = "dalek4"; // assignment to array type variable
// str5 = "dalek5"; // assignment to array type variable
printf("%s\n", str1);
printf("%s\n", str2);
printf("%s\n", str3);
printf("%s\n", str4);
printf("%s\n", str5);
}
There is a special tool called strings that we can use to examine an executable and see all of the strings that are stored in the data portion of the file.
If we run strings -d (just get strings in the data section) on the program above, we get (among many others)
test1
test2
test3
dalek1
dalek3
Notably, we don’t get str4 ( “test4”) and str5 (“test5”) ## Compiling
At this point, we have had #include directives in our code and we have had to add flags like -lm when we compiled – it is time to explain where those come from and a phase of compilation I’ve punted on: linking.
For our example, we are going to return to our color struct
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
typedef struct color
{
uint8_t r;
uint8_t g;
uint8_t b;
} color_t;
color_t *color(uint8_t red, uint8_t green, uint8_t blue)
{
color_t *c = (color_t *)malloc(sizeof(color_t));
c->r = red;
c->g = green;
c->b = blue;
return c;
}
void printColor(color_t *c)
{
printf("rgb(%u, %u, %u)\n", c->r, c->g, c->b);
}
void printHexstring(color_t *c)
{
printf("#%0x%0x%0x\n", c->r, c->g, c->b);
}
int main(int argc, char *argv[])
{
color_t *c1 = color(0x00, 0x3b, 0x6f);
printHexstring(c1);
printColor(c1);
free(c1);
}Aside – function definitions and declarations
This would be the common way to arrange your functions, but what happens if we move printColor below main?
We get all kinds of warnings because the compiler doesn’t look ahead. It reaches the call to printColor, before it reaches its definition. It makes some assumptions about how it is called, and then complains when the reality of the function doesn’t live up to its guesses.
If there is a real reason to call a function before it is defined (e.g., if we have two mutually recursive functions so we can’t put one first), then we can have a declaration that is separate from the definition that we put at the top.
Here is the declaration – note the semicolon:
void printColor(color_t * c);header files
If this was a big project, we won’t want everything in a single file. It might occur to us to put all of the color related functions and types together in one place.
So we will put our color functions in color.c and the main function in driver.c.
I can then compile my code with this:
gcc -o driver driver.c color.c
But when we do, we will get all kinds of warnings
$ gcc -o driver driver.c color.c
driver.c: In function 'main':
driver.c:6:9: error: unknown type name 'color_t'
6 | color_t *c1 = color(0x00, 0x3b, 0x6f);
| ^~~~~~~
driver.c:6:23: error: implicit declaration of function 'color' [0m\
i
ons.html#index-Wimplicit-function-declaration-Wimplicit-function-declaration0m]
6 | color_t *c1 = color(0x00, 0x3b, 0x6f);
| ^~~~~
driver.c:6:23: error: initialization of 'int *' from 'int' makes pointer from integer without a cast [0m\
/
onlinedocs/gcc-15.2.0/gcc/Warning-Options.html#index-Wint-conversion-Wint-conversion0m]
driver.c:8:9: error: implicit declaration of function 'printHexstring' [0m\
n
ing-Options.html#index-Wimplicit-function-declaration-Wimplicit-function-declaration0m]
8 | printHexstring(c1);
| ^~~~~~~~~~~~~~
driver.c:9:9: error: implicit declaration of function 'printColor' [0m\
-
Options.html#index-Wimplicit-function-declaration-Wimplicit-function-declaration0m]
9 | printColor(c1);
| ^~~~~~~~~~
The problem is that main doesn’t know anything about those functions and types now that we put them in a separate file. So, we need a mechanism for sharing the declarations of the functions. That mechanism is the header file.
The header file is just another source file, but this time ending in .h.
We will put our definitions in the header file and then #include "color.h"
note that the syntax is different. The quotes say “look for the header file in the local directory.”
Linking
As the size of our program grows, we may not want to wait for every file to compile when we have just made one simple change in a single file.
What we can do is compile the files individually
$ gcc -c driver.c
$ gcc -c color.c
This creates object files driver.o and color.o. We saw object files before when we were looking at the assembly. I said at the time that the files were compiled to machine code, but that they were not executable.
The linker can take a collection of object files and libraries and merge them all together to create the executable file. (thus linker)
The linker is making sure that the final executable includes all of the code that it needs in order to run the application.
As it turns out, gcc also provides a linker, so we can bundle out object files together to make an executable with the same tool.
$ gcc -o driver driver.o color.o
So really, when we just passed in a single C file, gcc just did all three steps: compile, assemble, and link.
Libraries
A collection of code like the color code might be something that we could use in different programs, so I could release it as a library. We just need two pieces: the header file so the compiler knows the signature of the functions and types provided, and the object file containing the machine code (most open source libraries will also release the source code so you can build the library yourself to tailor it to your environment.
In addition to external libraries, C has a “standard library”. All of these installed libraries can be found in the lib folder. You will find a couple of them on the system, most notably /lib and /usr/lib. We have used a collection of them already, including stdio, stdlib, strings, math, stdint…
The header files are in /include and /usr/include. These are well defined locations that the compiler knows to look in.
It can be more difficult for the compiler to figure out which library we actually want to include. This is what the -l flag is for. You have previously seen us include the math library with -lm.
I am leaving a lot of details out, but that is enough to gain a little bit of a better understand of what gcc is up to.
Mechanical level
vocabulary
Skills