pete > courses > CS 202 Spring 24 > Lecture 20: Linux, C, and variables
Lecture 20: Linux, C, and variables
Goals
- run command-line programs in Linux
- use the man program to learn how a specific program works
- recognize the RISC pattern of computation
- write, compile, and examine a simple C program
identify where variables are stored in memory
today’s agenda:
some Linux patterns
a stupid-simple C program
Linux!
you’ve used it already whenever you’ve ssh’d to weathertop to practice for quizzes or to check your progress or feedback
but soon you’ll also be using it to complete Tier-2 assignments as well
but why am I inflicting this on you?
most modern computing environments have something UNIX-y as their underpinnings: OS X, iOS, Android, and the vast majority of data centers/servers
so I feel it’s important that every undergrad CS student have been exposed to UNIX in some form during their education, and Linux is one of those forms
another reason I like Linux for CS students is that it strips away a lot of the polish of other operating systems that protect you from seeing how the system really works
a lot of the things you take for granted on, eg, OS X, are not done automatically on Linux
this is good if you don’t want to be bothered with the minutiae
it’s not so good if you want to understand what’s actually going on on your system, which is one of the core ideas of this entire course
on that note, my guess is most of your programming experience to this point has been in something like Eclipse or Thonny
those are examples of "Integrated Development Environments"
because they gather together a bunch of different tools and integrate them into a single application
for instance, with Eclipse, you get a text editor (to edit your code), a compiler (to… compile your code), a debugger (to step through your code and examine it as it runs), and some other things
we’re going to look at instances of all those tools, too, but separately
(recalling the previous lecture, we’re going to use gcc for compiling and gdb for debugging)
historically, the primary way of interacting with a UNIX system is a program called the shell, which is the text-based user interface
one types commands at the shell and presses Enter to run them
just like other operating systems, the filesystem is arranged as a hierarchy of folders which can contain files and other folders
on your own machines, you’ve used the Finder (OS X) or Explorer (Windows) to navigate the filesystem hierarchy
it’s the same idea on Linux, except the word "directory" is used instead of "folder"
each running program, including the shell, has a notion of running "inside" a particular directory; this is called the shell’s "current working directory"
you can tell the shell to report its current working directory with the command "pwd", for "print working directory":
$ pwd /home/pete
in that example, the "$" is the shell prompt, which is text printed by the shell telling you that it’s waiting for input
I typed "pwd" and pressed Enter
and the shell printed out "/home/pete" and then, because it finished doing what I asked of it, it printed a new prompt awaiting new input (not shown above)
digging into the output a bit more, the first thing to note is that "/" is the "path separator"
this means that each word between "/" characters is a path component
in this specific instance, it means that at the very top level of the filesystem, there is a directory (which we call the "root directory" and write as "/")
in this directory, there is at least one other directory, called "home"
and in the directory "home", there is another directory called "pete"
it is this directory called "pete" in which the shell is currently running
one can see what files are in the current directory with the command "ls" (short for "list")
$ ls
this command prints out a bunch of strings, each of which is the name of either a file or other directory within the current directory (ie, /home/pete)
(I am not including the output of this command in the notes because I’m paranoid)
ls is often configured to show different kinds of items in different colors
if you run this on weathertop, you may find that directories are printed in blue and normal files are printed in white or grey
there are other kinds of files, too, that will likely get printed in different colors, but not worth going into right now
this is all well and good, but we might want to list the files in another directory
there are two ways we might be able to accomplish this: we might be able to change the shell’s current directory or we might be able to ask ls to list the files within a different directory
we’ll actually look at both approaches
but let’s work on changing directories first
the cd command changes the current working directory of the shell
it works kind of like a function, in that it takes as a parameter the name of the directory to change to
except there are no parentheses: it’s the name of the command, followed by a space, followed by the name of the target directory
(recall that there was a directory called "tmp" within /home/pete)
$ cd tmp $ pwd /home/pete/tmp
there is something subtle about the above command: note that it only works because the current directory just happens to contain another directory called "tmp"
if we happened to be in another directory (ie, our current directory wasn’t /home/pete) which doesn’t contain a directory called "tmp", then the cd command above would not work
another way to say this is that the parameter to cd is interpreted relative to the current working directory
if we want to change back to /home/pete, there are two ways to tell cd where we want to go: we can give a relative path, which will be interpreted relative to /home/pete/tmp, or we can give an absolute path, which will be interpreted the same way no matter what my current directory is
an absolute path always starts with a "/" and gives the complete… path to where we want to go:
$ cd /home/pete $ pwd /home/pete
to get the same effect using relative paths, we can use the special name ".." which always refers to the parent directory (that is, the directory that contains the current directory)
$ cd tmp $ pwd /home/pete/tmp $ cd .. $ pwd /home/pete
when specifying relative paths, we can give multiple path components at once:
$ pwd /home/pete $ cd tmp/202-lecture-20 $ pwd /home/pete/tmp/202-lecture-20 $ cd ../../.. $ pwd /home
as I implied above, we can also ask ls to list files in a directory other than the current directory
for example, to see all the files and directories within /home, we can run
$ ls /home pete
this isn’t a very interesting example, because there is only one thing in /home, but it gets the idea across: we are able to see the contents of another directory without cd’ing to it
furthermore, for any command that takes a path as parameter, that path can be either relative or absolute
(there are very, very, very few exceptions to this, which we will not encounter in this course, so it’s good enough for now)
$ pwd /home/pete $ ls tmp/202-lecture-20 variables.c $ ls /home/pete/tmp/202-lecture-20 variables.c
we can also ask ls to tell us more information about the files: eg, "ls -l" will show "long" information about the files in the current directory
$ cd tmp/202-lecture-20 $ ls -l total 8 -rw-r----- 1 pete pete 119 Oct 28 10:15 variables.c
the left-most character is the file type: "-" for a regular file, "d" for directory
the next nine characters (either "rw-r—–" or "rwx——") are the permissions, which we won’t get into here
the following integer is not important (it’s the number of links… see Systems Programming and then Operating Systems)
the first "pete" is the name of the file owner
the second "pete" is the name of the group that owns the file
the next integer is the size
followed by the modification time
and finally the name of the file itself
honestly, you don’t need to have much insight into this stuff for this course, I just didn’t want it all to go by entirely unmentioned
to review, the basic behavior of ls is to show the names of all the files and directories within the current directory
we modified that behavior in two ways:
the -l changed how ls printed the contents
and the argument changed what ls operated on
many UNIX programs support similar patterns
and the "change how it operates" is usually specified with a letter or word beginning with a "-"
it will probably not surprise you to learn that we can do both at once:
$ ls -l /home total 4 drwx------ 38 pete pete 4096 Oct 30 22:54 pete
and the order doesn’t matter:
$ ls /home -l total 4 drwx------ 38 pete pete 4096 Oct 30 22:54 pete
that said, the "option arguments" (the things starting with a "-") usually come first
and, for some programs, the order does matter
how can you know if the order matters for a program you’re using?
for that matter, how can you know all the possible option arguments for a given program?
or what it does?
or the meaning of life?
fortunately, there is a reference for commands
the "man" command will bring up an online manual page about a given program
so to figure out how "ls" works (eg, what parameters it takes, what options it supports), we run
$ man ls
because we want "man" to operate on the string "ls", implying it’s something whose manual we wish to read
inside the manpage, use the arrow keys/pgup/pgdn/etc to navigate, and 'q' to quit
there are many more Linux commands, I’ll introduce relevant ones as they become relevant
when you’re starting out, it can be difficult to know what command to use for a given purpose, so I made a list here
the list intentionally does not describe how to use the commands; the intention is for you to check the manpage
be patient with manpages, though: they have a very particular writing style that takes some getting used to
and it is probably not a great idea to read a manpage start to finish; instead, skim it for what you need and revisit when you need to (I still do it this way and I’ve been using UNIX since before you were born)
on to C!
here’s the very simple program we’re going to start with:
it looks very much like Java
comments are enclosed within the /* */ pairs, can’t be nested
the first thing we do is declare a function called "main", which (as in Java) is both required and where the program starts its work
the main function takes two parameters, an int called "argc" and a "char *" called argv[]
don’t worry about what a char * is, we’ll get there
argc and argv are how a program can access its command-line arguments
ie, the parameters I give to a program when I run it from the shell
then I declare three integer variables: x, y, and z, in a manner reminiscent of Java
note that in C, you need to declare variables before you use them
it’s not like Python where a variable magically springs into existence when you first refer to it
then I set assign the values 42 and 19, respectively, to x and y
then I set z to the sum of x and y
like I said at the beginning: a stupid-simple program
the point is not to write a complicated program
the point is to demonstrate the basic trappings of a C program
and to show how operations variables in a higher-level language like C are translated to assembly language
so, to compile, I give this command to the shell, which combines the forms we looked at earlier:
$ arm-none-eabi-gcc -S variables.c"
we want the compiler ("arm-none-eabi-gcc") to operate on variables.c; we want it to produce assembly instead of machine code ("-S")
this will produce the output file "variables.s"
(possibly unsurprisingly, there is a command-line argument for arm-non-eabi-gcc that lets you specify the name of the output file; how would you find it? the manpage. don’t look though, it’s a horrifically complicated manpage)
there’s a lot of boilerplate in this file
meaning information that is necessary for making things work, but isn’t specifically germane to the problem at hand
in this class, ignore all the lines that start with a ".": they’re directives to the assembler and not relevant to what you need to know
the instructions we really care about are buried within
specifically:
mov r3, #42 str r3, [fp, #-8] mov r3, #19 str r3, [fp, #-12] ldr r2, [fp, #-8] ldr r3, [fp, #-12] add r3, r2, r3 str r3, [fp, #-16]
this puts the integer 42 into r3, then stores it off in memory somewhere
then puts 19 into r3 and stores it off in memory somewhere different (but close by)
then it loads the 42 into r2 and the 19 into r3
adds them together, puts the result in r3
then stores the value in r3 off into memory in yet a third location
(fp and sp are just different names for normal registers: they happen to be r11 and r13, respectively)
we can infer from the above that the compiler decided to store the value of the variable x at memory address fp-8, y at address fp-12, and z at address fp-16
what this means is that fp (a register) contains some value that we’re going to use as a memory address
the load and store instructions use that value as a starting point and subtract to find the location of specific variables
this gets at the idea that all the local variables for the main function are stored near each other: specifically, they are all stored aronud the address specified by the register fp
(for the time being, we’ll assume that fp is given to the program when it starts, to be elaborated on later)
this is a pervasive pattern in RISC assembly code
remember that RISC does not have instructions that operate directly on values in memory
so you’ll repeatedly see the pattern "load operands, operate, store result"
we see this pattern three times here
though the first two instances feature immediate operands, so the "load operands" step isn’t a distinct instruction