writemethod to write to that file
We have used
range extensively, and done so with different numbers of
>>> help(range) Help on class range in module builtins: class range(object) | range(stop) -> range object | range(start, stop[, step]) -> range object
This works because Python supports optional arguments, e.g. the optional
“step”. How would we implement our own version of
range? Consider the
def my_range_with_step(start, stop, step): """ Return a range Args: start: inclusive start index stop: exclusive stop index step: range increment Returns: A list of integers """ i = start r =  while i < stop: r.append(i) i += step return r def my_range_with_unitstep(start, stop): return my_range_with_step(start, stop, 1)
We could condense these two functions into one, if we could set a default value
step. Optional parameters are those with default values, e.g.
def my_range(start, stop, step=1): """ Return a range Args: start: inclusive start index stop: exclusive stop index step: range increment Returns: A list of integers """ i = start r =  while i < stop: r.append(i) i += step return r
Now we can use the same function for the two different use cases. More generally, optional parameters are useful when there is a sensible default value (i.e. stepping by one) but the caller might want/need to change that value sometimes.
Note that you can also specify parameters by name, which is helpful if there are many optional parameters and you only want to change one or two.
>>> from optional_parameters import my_range >>> my_range(0, 5, step=2) [0, 2, 4] >>> my_range(start=1, stop=5) [1, 2, 3, 4] >>> my_range(5, start=0) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: my_range() got multiple values for argument 'start' >>> my_range(start=0, 5) File "<stdin>", line 1 SyntaxError: positional argument follows keyword argument
Note there are some limits, keyword arguments must follow positional arguments and you can’t specify the same argument more than once.
>>> help(print) Help on built-in function print in module builtins: print(...) print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False) Prints the values to a stream, or to sys.stdout by default. Optional keyword arguments: file: a file-like object (stream); defaults to the current sys.stdout. sep: string inserted between values, default a space. end: string appended after the last value, default a newline. flush: whether to forcibly flush the stream.
A common place to use keyword arguments is with
sep) or the
end, but not all.
PI Questions (optional arguments)
We can see there are times when it is easier to work at the command line and other times, most times, when it is easier to work within Thonny. In most situations it will be more efficient to develop within Thonny since it nicely integrates all the tools we need.
One tool we have not yet used is the debugger. The debugger allows use to step
through our program one statement at a time and inspect the current state of
variables. This is an alternative, and sometimes more powerful, approach to
debugging compared to inserting
Our workflow is to use Thonny to step through our program one line a time while investigating the current value of any variables. We “Step over” lines to get to the area of the program we suspect has a problem and then “Step into” to investigate further. If we want to run until a specific spot in the program we can use “Run -> Run to cursor”. We will try it out on the following example.
More generally, the combination of introspection and control is a very powerful for debugging our programs. There is quite a bit the debugger can do, and I hope you will start experimenting with it when you are trying to figure out why your code doesn’t work.
Let’s investigate a simple function with bug: debug_example.py
We can place our cursor on the line
decibels = 10 * math.log10(ratio) and use
the “Run to cursor” option. Using the variable display on the lower left, we
see that ratio is 0 (because used floor division) and 0 is an invalid argument
Recall that logarithms are only defined for positive values, i.e.,
ratio > 0. Thus the assumption of this function is that the ratio of signal to noise is positive. We can and should include that expectation in our docstring. Another tool we can use verify assumptions/requirements like this is the
assert is followed by a boolean expression and an optional message after a comma, e.g.,
assert ratio > 0, "Signal to noise ratio must be a positive number"
If the boolean expression evaluates to True, execution proceeds normally. If it is False, execution is halted with an
AssertionError that prints the optional message. This can be a very helpful tool for automatically verifying assumptions, especially when debugging, and providing more informative messages when one of those assumptions is violated.
Imagine we are trying to study the structure of Middlebury syllabi, and
particularly the use of e-mail. To further this study we want to scrape
Middlebury course webpages for the e-mail addresses listed by the professor and
write them to file. And we will start with out course webpage. Note web
scraping can have potential legal
issues and so you should always check a site’s term and conditions and
robots.txt file, if available, before doing any scraping.
How could we implement our scraper? Let’s check out the page source and see if we can come up with some ideas…
Let’s check out a potential implementation: email_extractor.py.
What is new in this program?
urllib.request.urlopen: As the name suggests this function opens a URL for reading much the same way we read from a file. In fact, we can iterate line by line with a for loop in exactly the same way. One key difference is that response is raw
bytes, not a string. To obtain a string we need to use the
decodemethod. Here we decode assuming the encoding is “utf-8” and we want to “ignore” errors. Those are reasonable settings for a webpage. This is also our first nested module, that is we are importing a module nested in another module.
writemethod to write a string to the file. Note that unlike
writedoesn’t automatically append a newline and so we need to do so.
# Python is rejecting the certificate used for the CS dept. server so we bypass some of those checks import ssl ssl._create_default_https_context = ssl._create_unverified_context
Let’s use what we learned about debugging to understand how this code works. Trying step through this code line by line…
How could we adapt this code to take command line arguments instead of the fixed URL and output file? We would add the following code to the bottom of our program.
if __name__ == "__main__" conditional we check for the expected
number of command-line arguments. Recall that the first element of
is always the name of the file, so if we expect two command-line arguments the
sys.argv should be 3 (the number of arguments + 1). If we don’t
have the correct number of arguments we print a help message for the user
showing the expected command line. If we do have the correct number of
arguments we proceed to invoke the program code using the command-line
arguments we extracted from the
def print_usage(): """Print usage""" print("python3 email_extractor.py <URL> <OUT_FILE>") if __name__ == "__main__": if len(sys.argv) != 3: print_usage() else: url = sys.argv outfile = sys.argv emails = get_emails(url) write_list_to_file(emails, outfile)
Notice that we specified the beginning and ending search strings as optional arguments, that is parameters with default values. Why do so? By using default values are function “just works” for its original intended purpose, find e-mails:
>>> get_emails(COURSE_PAGE) ['email@example.com', 'firstname.lastname@example.org']
but we can also easily adapt it for other purposes, such as finding go links without needed to change the code. That change makes our function that much more useful!
>>> get_emails(COURSE_PAGE, search_string="go.middlebury.edu/") ['linderman', 'cstutors', 'cs150ab-campuswire']