write
method to write to that fileCommand line arguments are values that are passed to a program at runtime.
Rather than using the input function to get input from the user while a program is running, we can allow the user to specify necessary information as they invoke the program.
We will use sys_args.py as our working example:
With the Python module sys
(short for “system”) there is a variable
argv
that is set to be a list of the command line arguments. The
first element of this list is always the name of the program that is
executing (including the path where the file resides if not in the
same folder).
>>> %Run sys_args.py
Arguments: ['sys_args.py']
0: sys_args.py
If we added command line arguments to the Thonny run command, they would be
appended to the argv
list.
>>> %Run sys_args.py these are some arguments
Arguments: ['sys_args.py', 'these', 'are', 'some', 'arguments']
0: sys_args.py
1: these
2: are
3: some
4: arguments
While we can specify command line arguments in Thonny, that is not how this functionality is most useful. Instead, we typically use command line arguments at the command line in a terminal window.
We can invoke Python, specifically python3
(python
on Windows)
from the command line in a terminal window. We can open a terminal
window from within Thonny via “Tools -> Open System Shell” menu
option. Once we have launched the shell we need to navigate to the
folder with our Python program (see below for a brief list of terminal
commands).
$ python3 sys_args.py these are some arguments
Arguments: ['sys_args.py', 'these', 'are', 'some', 'arguments']
0: sys_args.py
1: these
2: are
3: some
4: arguments
python3
is the Python interpreter (python
on Windows), the program that
actually runs inside the Thonny shell. If we run python3
(python
on
Windows) without any arguments, we launch the familiar REPL (read-eval-print-loop).
Invoke the quit()
function to exit, or on OSX Ctrl+d):
$ python3
Python 3.7.3 (default, Mar 27 2019, 09:23:15)
[Clang 10.0.1 (clang-1001.0.46.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
When we supply a path to a Python script as the first argument, Python runs
that script (just like the “green arrow” in Thonny). Any additional arguments
after the script become the command line arguments to the script (available in
argv
).
Thonny largely insulates us from the notion of working directory, that is, where in the file system we are executing our program. When we invoke Python in the terminal, we will need to navigate within the terminal to the directory containing our program.
The key commands to navigate the terminal are:
Command | Description |
---|---|
ls |
List files |
cd <dir> |
Change directory to <dir> |
cd .. |
Change to parent directory |
cd |
Change to home directory |
pwd |
Print current working directory |
more <file> |
Show contents of file one screen full at a time (hit q to exit) |
The Windows equivalent to terminal is cmd (type cmd
into the search bar). The
mapping between commands for navigating within the terminal/shell are:
Linux/OSX | Windows |
---|---|
ls |
dir |
cd |
cd |
cd /home/briggs/ |
cd C:\Users\briggs |
With these commands we are navigating the same file system and directories you see with your graphical browser, but doing so in a text-based programmatic environment.
For example you will likely need to navigate to the directory that contains your Python script. A protocol to do so:
/Users/briggs/cs150/sys_args.py
, the directory is everything up the
last /, i.e. /Users/briggs/cs150
.In the terminal at the command prompt, e.g. at the $
, type cd
for
“change directory” then enter the path. For example:
$ cd /Users/briggs/cs150/
cd
only works on directories. If you have any spaces in your path, you
will need add quotes around the path so it is interpreted as a single
string (you can use left and right arrows to move in your command to edit
it). For example:
$ cd "/Users/briggs/cs150/"
Imagine we are trying to study the structure of Middlebury syllabi, and
particularly the use of e-mail. To further this study we want to scrape
Middlebury course webpages for the e-mail addresses listed by the professor and
write them to file. And we will start with our course webpage. Note web
scraping can have potential legal
issues and so you should always check a site’s term and conditions and
robots.txt
file, if available, before doing any scraping.
How could we implement our scraper? Let’s check out the page source and see if we can come up with some ideas… What are some additional tools we might need?
Let’s check out a potential implementation: web_scraper0.py
What is new in this program?
urllib.request.urlopen
: As the name suggests this function opens a URL for
reading much the same way we read from a file. In fact, we can iterate line
by line with a for loop in exactly the same way. One key difference is that the
response is raw bytes
, not a string. To obtain a string we need to use the
decode
method. Here we decode assuming the encoding is “utf-8” and we want
to “ignore” errors. Those are reasonable settings for a webpage. This is also
our first nested module, that is, we are importing a module in a module.open
and then use the write
method to write a string to the
file. Note that unlike print
, write
doesn’t automatically append a
newline and so we need to do so.How could we adapt this code to take command line arguments instead of the fixed URL and output file? We would add the following code to the bottom of our program.
Inside the if __name__ == "__main__"
conditional we check for the expected
number of command-line arguments. Recall that the first element of sys.argv
is always the name of the file, so if we expect two command-line arguments the
length of sys.argv
should be 3 (the number of arguments + 1). If we don’t
have the correct number of arguments we print a help message for the user
showing the expected command line. If we do have the correct number of
arguments we proceed to invoke the program code using the command-line
arguments we extracted from the sys.argv
list.
See web_scraper.py for the addition of this code.
def print_usage():
"""Print usage of the program"""
print("python3 web_scraper.py <URL> <OUT_FILE>")
if __name__ == "__main__":
if len(sys.argv) != 3:
print_usage()
else:
url = sys.argv[1]
outfile = sys.argv[2]
data = get_data(url)
write_list_to_file(data, outfile)
print("Wrote:", outfile)
The why of the command line is a much larger question that we won’t fully experience in this course. Being able to efficiently use a command line environment (and write programs to be used in that environment) will make you much more productive and effective at data analysis and other computational tasks.
For example, suppose we are curious about how many lines of code are included in our lecture examples. The function below counts the non-empty lines in a file.
def count_lines(filename):
"""
Count non-empty lines in file
Args:
filename: File to examine
Return: Count of non-empty lines
"""
with open(filename, "r") as file:
count = 0
for line in file:
if line.strip() != "":
count +=1
return count
How could we run this on every .py file in our cs150 folder?
We could manually make a list of all the files, but that is slow and error prone. Instead we would like to solve this problem programmatically. The command line can help us do so. It provides a mechanism for programmatically interacting with your computer, e.g., programmatically accessing directories, files, other programs and more. Counting all the lines in all the example programs can be as simple as the following. Let’s learn how to make this work:
$ python3 line_counter.py *.py
Total lines: 1074
Here we use *.py
as a wildcard that
expands into all files that end in “.py”, i.e., this is equivalent to
$ python3 line_counter.py my_module.py sys_args.py ...
To make this work, we can write Python code to make our line counter
process any number of files provided on the command line. Here we use a
for loop to iterate through all the files provided on the command line
and thus in the sys.argv
list. With that small amount of code we now
have a very useful (and efficient) tool. Check out the complete
implementation line_counter.py
if __name__ == "__main__":
if len(sys.argv) == 1:
# Check that at least one file is provided on the command line
print("Usage: python line_counter.py <1 or more files>")
else:
count = 0
for filename in sys.argv[1:]:
count += count_lines(filename)
print("Total lines:", count)
We can see there are times when it is easier to work at the command line and other times, most times, when it is easier to work within Thonny. In most situations it will be more efficient to develop within Thonny since it nicely integrates all the tools we need.
One tool we have not yet used is the debugger. The debugger allows us to step
through our program one statement at a time and inspect the current state of
variables. This is an alternative, and sometimes more powerful, approach to
debugging compared to inserting print
statements as we have been doing to
date.
Our workflow is to use Thonny to step through our program one line a time while investigating the current value of any variables. We “Step over” lines to get to the area of the program we suspect has a problem and then “Step into” to investigate further. If we want to run until a specific spot in the program we can click to the right of a line number to place a red dot. Then hitting the debug icon will run to that point. Another option is to first hit the debug icon and then click on the code line where we want to stop, and then use the pull-down “Run -> Run to cursor”. We will try it out on the following example.
More generally, the combination of introspection and control is a very powerful for debugging our programs. There is quite a bit the debugger can do, and we hope you will start experimenting with it when you are trying to figure out why your code doesn’t work.
Let’s investigate a simple function with bug: debug_example.py
We can place a red dot to the right of the line number on the line
decibels = 10 * math.log10(ratio)
and then click the debug icon.
Using the variable display on the lower left, we see that
ratio is 0 (because used floor division) and 0 is an invalid argument
to math.log10
.