CS 202 Lecture 20 – Linux, C, and variables

pete > courses > CS 202 Spring 24 > Lecture 20: Linux, C, and variables

Lecture 20: Linux, C, and variables

Goals

run command-line programs in Linux
use the man program to learn how a specific program works
recognize the RISC pattern of computation
write, compile, and examine a simple C program
identify where variables are stored in memory

today’s agenda:

some Linux patterns

a stupid-simple C program

Linux!

you’ve used it already whenever you’ve ssh’d to weathertop to practice for quizzes or to check your progress or feedback

but soon you’ll also be using it to complete Tier-2 assignments as well

but why am I inflicting this on you?

most modern computing environments have something UNIX-y as their underpinnings: OS X, iOS, Android, and the vast majority of data centers/servers

so I feel it’s important that every undergrad CS student have been exposed to UNIX in some form during their education, and Linux is one of those forms

another reason I like Linux for CS students is that it strips away a lot of the polish of other operating systems that protect you from seeing how the system really works

a lot of the things you take for granted on, eg, OS X, are not done automatically on Linux

this is good if you don’t want to be bothered with the minutiae

it’s not so good if you want to understand what’s actually going on on your system, which is one of the core ideas of this entire course

on that note, my guess is most of your programming experience to this point has been in something like Eclipse or Thonny

those are examples of "Integrated Development Environments"

because they gather together a bunch of different tools and integrate them into a single application

for instance, with Eclipse, you get a text editor (to edit your code), a compiler (to… compile your code), a debugger (to step through your code and examine it as it runs), and some other things

we’re going to look at instances of all those tools, too, but separately

(recalling the previous lecture, we’re going to use gcc for compiling and gdb for debugging)

historically, the primary way of interacting with a UNIX system is a program called the shell, which is the text-based user interface

one types commands at the shell and presses Enter to run them

just like other operating systems, the filesystem is arranged as a hierarchy of folders which can contain files and other folders

on your own machines, you’ve used the Finder (OS X) or Explorer (Windows) to navigate the filesystem hierarchy

it’s the same idea on Linux, except the word "directory" is used instead of "folder"

each running program, including the shell, has a notion of running "inside" a particular directory; this is called the shell’s "current working directory"

you can tell the shell to report its current working directory with the command "pwd", for "print working directory":

$ pwd
/home/pete

in that example, the "$" is the shell prompt, which is text printed by the shell telling you that it’s waiting for input

I typed "pwd" and pressed Enter

and the shell printed out "/home/pete" and then, because it finished doing what I asked of it, it printed a new prompt awaiting new input (not shown above)

digging into the output a bit more, the first thing to note is that "/" is the "path separator"

this means that each word between "/" characters is a path component

in this specific instance, it means that at the very top level of the filesystem, there is a directory (which we call the "root directory" and write as "/")

in this directory, there is at least one other directory, called "home"

and in the directory "home", there is another directory called "pete"

it is this directory called "pete" in which the shell is currently running

one can see what files are in the current directory with the command "ls" (short for "list")

$ ls

this command prints out a bunch of strings, each of which is the name of either a file or other directory within the current directory (ie, /home/pete)

(I am not including the output of this command in the notes because I’m paranoid)

ls is often configured to show different kinds of items in different colors

if you run this on weathertop, you may find that directories are printed in blue and normal files are printed in white or grey

there are other kinds of files, too, that will likely get printed in different colors, but not worth going into right now

this is all well and good, but we might want to list the files in another directory

there are two ways we might be able to accomplish this: we might be able to change the shell’s current directory or we might be able to ask ls to list the files within a different directory

we’ll actually look at both approaches

but let’s work on changing directories first

the cd command changes the current working directory of the shell

it works kind of like a function, in that it takes as a parameter the name of the directory to change to

except there are no parentheses: it’s the name of the command, followed by a space, followed by the name of the target directory

(recall that there was a directory called "tmp" within /home/pete)

$ cd tmp
$ pwd
/home/pete/tmp

there is something subtle about the above command: note that it only works because the current directory just happens to contain another directory called "tmp"

if we happened to be in another directory (ie, our current directory wasn’t /home/pete) which doesn’t contain a directory called "tmp", then the cd command above would not work

another way to say this is that the parameter to cd is interpreted relative to the current working directory

if we want to change back to /home/pete, there are two ways to tell cd where we want to go: we can give a relative path, which will be interpreted relative to /home/pete/tmp, or we can give an absolute path, which will be interpreted the same way no matter what my current directory is

an absolute path always starts with a "/" and gives the complete… path to where we want to go:

$ cd /home/pete
$ pwd
/home/pete

to get the same effect using relative paths, we can use the special name ".." which always refers to the parent directory (that is, the directory that contains the current directory)

$ cd tmp
$ pwd
/home/pete/tmp
$ cd ..
$ pwd
/home/pete

when specifying relative paths, we can give multiple path components at once:

$ pwd
/home/pete
$ cd tmp/202-lecture-20
$ pwd
/home/pete/tmp/202-lecture-20
$ cd ../../..
$ pwd
/home

as I implied above, we can also ask ls to list files in a directory other than the current directory

for example, to see all the files and directories within /home, we can run

$ ls /home
pete

this isn’t a very interesting example, because there is only one thing in /home, but it gets the idea across: we are able to see the contents of another directory without cd’ing to it

furthermore, for any command that takes a path as parameter, that path can be either relative or absolute

(there are very, very, very few exceptions to this, which we will not encounter in this course, so it’s good enough for now)

$ pwd
/home/pete
$ ls tmp/202-lecture-20
variables.c
$ ls /home/pete/tmp/202-lecture-20
variables.c

we can also ask ls to tell us more information about the files: eg, "ls -l" will show "long" information about the files in the current directory

$ cd tmp/202-lecture-20
$ ls -l
total 8
-rw-r----- 1 pete pete 119 Oct 28 10:15 variables.c

the left-most character is the file type: "-" for a regular file, "d" for directory

the next nine characters (either "rw-r—–" or "rwx——") are the permissions, which we won’t get into here

the following integer is not important (it’s the number of links… see Systems Programming and then Operating Systems)

the first "pete" is the name of the file owner

the second "pete" is the name of the group that owns the file

the next integer is the size

followed by the modification time

and finally the name of the file itself

honestly, you don’t need to have much insight into this stuff for this course, I just didn’t want it all to go by entirely unmentioned

to review, the basic behavior of ls is to show the names of all the files and directories within the current directory

we modified that behavior in two ways:

the -l changed how ls printed the contents
and the argument changed what ls operated on

many UNIX programs support similar patterns

and the "change how it operates" is usually specified with a letter or word beginning with a "-"

it will probably not surprise you to learn that we can do both at once:

$ ls -l /home
total 4
drwx------ 38 pete pete 4096 Oct 30 22:54 pete

and the order doesn’t matter:

$ ls /home -l
total 4
drwx------ 38 pete pete 4096 Oct 30 22:54 pete

that said, the "option arguments" (the things starting with a "-") usually come first

and, for some programs, the order does matter

how can you know if the order matters for a program you’re using?

for that matter, how can you know all the possible option arguments for a given program?

or what it does?

or the meaning of life?

fortunately, there is a reference for commands

the "man" command will bring up an online manual page about a given program

so to figure out how "ls" works (eg, what parameters it takes, what options it supports), we run

$ man ls

because we want "man" to operate on the string "ls", implying it’s something whose manual we wish to read

inside the manpage, use the arrow keys/pgup/pgdn/etc to navigate, and 'q' to quit

there are many more Linux commands, I’ll introduce relevant ones as they become relevant

when you’re starting out, it can be difficult to know what command to use for a given purpose, so I made a list here

the list intentionally does not describe how to use the commands; the intention is for you to check the manpage

be patient with manpages, though: they have a very particular writing style that takes some getting used to

and it is probably not a great idea to read a manpage start to finish; instead, skim it for what you need and revisit when you need to (I still do it this way and I’ve been using UNIX since before you were born)

on to C!

here’s the very simple program we’re going to start with:

variables.c

it looks very much like Java

comments are enclosed within the /* */ pairs, can’t be nested

the first thing we do is declare a function called "main", which (as in Java) is both required and where the program starts its work

the main function takes two parameters, an int called "argc" and a "char *" called argv[]

don’t worry about what a char * is, we’ll get there

argc and argv are how a program can access its command-line arguments

ie, the parameters I give to a program when I run it from the shell

then I declare three integer variables: x, y, and z, in a manner reminiscent of Java

note that in C, you need to declare variables before you use them

it’s not like Python where a variable magically springs into existence when you first refer to it

then I set assign the values 42 and 19, respectively, to x and y

then I set z to the sum of x and y

like I said at the beginning: a stupid-simple program

the point is not to write a complicated program

the point is to demonstrate the basic trappings of a C program

and to show how operations variables in a higher-level language like C are translated to assembly language

so, to compile, I give this command to the shell, which combines the forms we looked at earlier:

$ arm-none-eabi-gcc -S variables.c"

we want the compiler ("arm-none-eabi-gcc") to operate on variables.c; we want it to produce assembly instead of machine code ("-S")

this will produce the output file "variables.s"

(possibly unsurprisingly, there is a command-line argument for arm-non-eabi-gcc that lets you specify the name of the output file; how would you find it? the manpage. don’t look though, it’s a horrifically complicated manpage)

variables.s

there’s a lot of boilerplate in this file

meaning information that is necessary for making things work, but isn’t specifically germane to the problem at hand

in this class, ignore all the lines that start with a ".": they’re directives to the assembler and not relevant to what you need to know

the instructions we really care about are buried within

specifically:

mov r3, #42
str r3, [fp, #-8]
mov r3, #19
str r3, [fp, #-12]
ldr r2, [fp, #-8]
ldr r3, [fp, #-12]
add r3, r2, r3
str r3, [fp, #-16]

this puts the integer 42 into r3, then stores it off in memory somewhere

then puts 19 into r3 and stores it off in memory somewhere different (but close by)

then it loads the 42 into r2 and the 19 into r3

adds them together, puts the result in r3

then stores the value in r3 off into memory in yet a third location

(fp and sp are just different names for normal registers: they happen to be r11 and r13, respectively)

we can infer from the above that the compiler decided to store the value of the variable x at memory address fp-8, y at address fp-12, and z at address fp-16

what this means is that fp (a register) contains some value that we’re going to use as a memory address

the load and store instructions use that value as a starting point and subtract to find the location of specific variables

this gets at the idea that all the local variables for the main function are stored near each other: specifically, they are all stored aronud the address specified by the register fp

(for the time being, we’ll assume that fp is given to the program when it starts, to be elaborated on later)

this is a pervasive pattern in RISC assembly code

remember that RISC does not have instructions that operate directly on values in memory

so you’ll repeatedly see the pattern "load operands, operate, store result"

we see this pattern three times here

though the first two instances feature immediate operands, so the "load operands" step isn’t a distinct instruction

Lecture 20: Linux, C, and variables

identify where variables are stored in memory