Class 20: Optional arguments, Debugging, etc.

Objectives for today

• Define and use optional and keyword arguments
• Utilize the debugger to examine program state and execution
• Fetch and iterate through data from a URL
• Open a file for writing and use the write method to write to that file

Optional Parameters

We have used range extensively, and done so with different numbers of parameters.

>>> help(range)
Help on class range in module builtins:

class range(object)
|  range(stop) -> range object
|  range(start, stop[, step]) -> range object


This works because Python supports optional arguments, e.g. the optional “step”. How would we implement our own version of range? Consider the following (optional_paramters.py):

def my_range_with_step(start, stop, step):
"""
Return a range

Args:
start: inclusive start index
stop: exclusive stop index
step: range increment

Returns: A list of integers
"""
i = start
r = []

while i < stop:
r.append(i)
i += step

return r

def my_range_with_unitstep(start, stop):
return my_range_with_step(start, stop, 1)


We could condense these two functions into one, if we could set a default value for step. Optional parameters are those with default values, e.g.

def my_range(start, stop, step=1):
"""
Return a range

Args:
start: inclusive start index
stop: exclusive stop index
step: range increment

Returns: A list of integers
"""
i = start
r = []

while i < stop:
r.append(i)
i += step

return r


Now we can use the same function for the two different use cases. More generally, optional parameters are useful when there is a sensible default value (i.e. stepping by one) but the caller might want/need to change that value sometimes.

Note that you can also specify parameters by name, which is helpful if there are many optional parameters and you only want to change one or two.

>>> from optional_parameters import my_range
>>> my_range(0, 5, step=2)
[0, 2, 4]
>>> my_range(start=1, stop=5)
[1, 2, 3, 4]
>>> my_range(5, start=0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: my_range() got multiple values for argument 'start'
>>> my_range(start=0, 5)
File "<stdin>", line 1
SyntaxError: positional argument follows keyword argument


Note there are some limits, keyword arguments must follow positional arguments and you can’t specify the same argument more than once.

>>> help(print)
Help on built-in function print in module builtins:

print(...)
print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)

Prints the values to a stream, or to sys.stdout by default.
Optional keyword arguments:
file:  a file-like object (stream); defaults to the current sys.stdout.
sep:   string inserted between values, default a space.
end:   string appended after the last value, default a newline.
flush: whether to forcibly flush the stream.


A common place to use keyword arguments is with print, where you will likely only want to modify one of the many optional arguments, e.g. the separator (sep) or the end, but not all.

PI Questions (optional arguments)

Debugging in Thonny (and assert)

We can see there are times when it is easier to work at the command line and other times, most times, when it is easier to work within Thonny. In most situations it will be more efficient to develop within Thonny since it nicely integrates all the tools we need.

One tool we have not yet used is the debugger. The debugger allows use to step through our program one statement at a time and inspect the current state of variables. This is an alternative, and sometimes more powerful, approach to debugging compared to inserting print statements as we have been doing to date.

Our workflow is to use Thonny to step through our program one line a time while investigating the current value of any variables. We “Step over” lines to get to the area of the program we suspect has a problem and then “Step into” to investigate further. If we want to run until a specific spot in the program we can use “Run -> Run to cursor”. We will try it out on the following example.

More generally, the combination of introspection and control is a very powerful for debugging our programs. There is quite a bit the debugger can do, and I hope you will start experimenting with it when you are trying to figure out why your code doesn’t work.

Let’s investigate a simple function with bug: debug_example.py

We can place our cursor on the line decibels = 10 * math.log10(ratio) and use the “Run to cursor” option. Using the variable display on the lower left, we see that ratio is 0 (because used floor division) and 0 is an invalid argument to math.log10.

Recall that logarithms are only defined for positive values, i.e., ratio > 0. Thus the assumption of this function is that the ratio of signal to noise is positive. We can and should include that expectation in our docstring. Another tool we can use verify assumptions/requirements like this is the assert statement. assert is followed by a boolean expression and an optional message after a comma, e.g.,

assert ratio > 0, "Signal to noise ratio must be a positive number"


If the boolean expression evaluates to True, execution proceeds normally. If it is False, execution is halted with an AssertionError that prints the optional message. This can be a very helpful tool for automatically verifying assumptions, especially when debugging, and providing more informative messages when one of those assumptions is violated.

Putting it all together: Reading from URLs

Imagine we are trying to study the structure of Middlebury syllabi, and particularly the use of e-mail. To further this study we want to scrape Middlebury course webpages for the e-mail addresses listed by the professor and write them to file. And we will start with out course webpage. Note web scraping can have potential legal issues and so you should always check a site’s term and conditions and robots.txt file, if available, before doing any scraping.

How could we implement our scraper? Let’s check out the page source and see if we can come up with some ideas…

Let’s check out a potential implementation: email_extractor.py.

What is new in this program?

• urllib.request.urlopen: As the name suggests this function opens a URL for reading much the same way we read from a file. In fact, we can iterate line by line with a for loop in exactly the same way. One key difference is that response is raw bytes, not a string. To obtain a string we need to use the decode method. Here we decode assuming the encoding is “utf-8” and we want to “ignore” errors. Those are reasonable settings for a webpage. This is also our first nested module, that is we are importing a module nested in another module.
• We are writing a file. Specifically we opened the file with “w” as the second argument to write and then use the write method to write a string to the file. Note that unlike print, write doesn’t automatically append a newline and so we need to do so.
• As a note, for reasons that are unclear, Python is rejecting the certificate used by the CS department server to implement https. As a result we have to implement this workaround:
  # Python is rejecting the certificate used for the CS dept. server so we bypass some of those checks
import ssl
ssl._create_default_https_context = ssl._create_unverified_context


Let’s use what we learned about debugging to understand how this code works. Trying step through this code line by line…

How could we adapt this code to take command line arguments instead of the fixed URL and output file? We would add the following code to the bottom of our program.

Inside the if __name__ == "__main__" conditional we check for the expected number of command-line arguments. Recall that the first element of sys.argv is always the name of the file, so if we expect two command-line arguments the length of sys.argv should be 3 (the number of arguments + 1). If we don’t have the correct number of arguments we print a help message for the user showing the expected command line. If we do have the correct number of arguments we proceed to invoke the program code using the command-line arguments we extracted from the sys.argv list.

def print_usage():
"""Print usage"""
print("python3 email_extractor.py <URL> <OUT_FILE>")

if __name__ == "__main__":
if len(sys.argv) != 3:
print_usage()
else:
url = sys.argv[1]
outfile = sys.argv[2]
emails = get_emails(url)
write_list_to_file(emails, outfile)


Notice that we specified the beginning and ending search strings as optional arguments, that is parameters with default values. Why do so? By using default values are function “just works” for its original intended purpose, find e-mails:

>>> get_emails(COURSE_PAGE)

>>> get_emails(COURSE_PAGE, search_string="go.middlebury.edu/")