write
method to write to that fileWe have used range
extensively, and done so with different numbers of
parameters.
>>> help(range)
Help on class range in module builtins:
class range(object)
| range(stop) -> range object
| range(start, stop[, step]) -> range object
This works because Python supports optional arguments, e.g. the optional
“step”. How would we implement our own version of range
? Consider the
following (optional_paramters.py):
def my_range_with_step(start, stop, step):
"""
Return a range
Args:
start: inclusive start index
stop: exclusive stop index
step: range increment
Returns: A list of integers
"""
i = start
r = []
while i < stop:
r.append(i)
i += step
return r
def my_range_with_unitstep(start, stop):
return my_range_with_step(start, stop, 1)
We could condense these two functions into one, if we could set a default value
for step
. Optional parameters are those with default values, e.g.
def my_range(start, stop, step=1):
"""
Return a range
Args:
start: inclusive start index
stop: exclusive stop index
step: range increment
Returns: A list of integers
"""
i = start
r = []
while i < stop:
r.append(i)
i += step
return r
Now we can use the same function for the two different use cases. More generally, optional parameters are useful when there is a sensible default value (i.e. stepping by one) but the caller might want/need to change that value sometimes.
Note that you can also specify parameters by name, which is helpful if there are many optional parameters and you only want to change one or two.
>>> from optional_parameters import my_range
>>> my_range(0, 5, step=2)
[0, 2, 4]
>>> my_range(start=1, stop=5)
[1, 2, 3, 4]
>>> my_range(5, start=0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: my_range() got multiple values for argument 'start'
>>> my_range(start=0, 5)
File "<stdin>", line 1
SyntaxError: positional argument follows keyword argument
Note there are some limits, keyword arguments must follow positional arguments and you can’t specify the same argument more than once.
>>> help(print)
Help on built-in function print in module builtins:
print(...)
print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
Prints the values to a stream, or to sys.stdout by default.
Optional keyword arguments:
file: a file-like object (stream); defaults to the current sys.stdout.
sep: string inserted between values, default a space.
end: string appended after the last value, default a newline.
flush: whether to forcibly flush the stream.
A common place to use keyword arguments is with print
, where you will likely
only want to modify one of the many optional arguments, e.g. the separator (sep
) or the end
, but not all.
PI Questions (optional arguments)
assert
)We can see there are times when it is easier to work at the command line and other times, most times, when it is easier to work within Thonny. In most situations it will be more efficient to develop within Thonny since it nicely integrates all the tools we need.
One tool we have not yet used is the debugger. The debugger allows use to step
through our program one statement at a time and inspect the current state of
variables. This is an alternative, and sometimes more powerful, approach to
debugging compared to inserting print
statements as we have been doing to
date.
Our workflow is to use Thonny to step through our program one line a time while investigating the current value of any variables. We “Step over” lines to get to the area of the program we suspect has a problem and then “Step into” to investigate further. If we want to run until a specific spot in the program we can use “Run -> Run to cursor” or set a breakpoint by clicking on the line number. We will try it out on the following example.
More generally, the combination of introspection and control is a very powerful for debugging our programs. There is quite a bit the debugger can do, and I hope you will start experimenting with it when you are trying to figure out why your code doesn’t work. Note that I unselected “Show function call (frames) in separate windows” option (in Options -> Run & Debug) to improve the workflow.
Let’s investigate a simple function with bug: debug_example.py
We can place our cursor on the line decibels = 10 * math.log10(ratio)
and use
the “Run to cursor” option. Using the variable display on the lower left, we
see that ratio is 0 (because used floor division) and 0 is an invalid argument
to math.log10
.
Recall that logarithms are only defined for positive values, i.e., ratio > 0
. Thus the assumption of this function is that the ratio of signal to noise is positive. We can and should include that expectation in our docstring. Another tool we can use verify assumptions/requirements like this is the assert
statement. assert
is followed by a boolean expression and an optional message after a comma, e.g.,
assert ratio > 0, "Signal to noise ratio must be a positive number"
If the boolean expression evaluates to True, execution proceeds normally. If it is False, execution is halted with an AssertionError
that prints the optional message. This can be a very helpful tool for automatically verifying assumptions, especially when debugging, and providing more informative messages when one of those assumptions is violated.
Imagine we are trying to study the structure of Middlebury syllabi, and
particularly the use of e-mail. To further this study we want to scrape
Middlebury course webpages for the e-mail addresses listed by the professor and
write them to file. And we will start with out course webpage. Note web
scraping can have potential legal
issues and so you should always check a site’s term and conditions and
robots.txt
file, if available, before doing any scraping.
How could we implement our scraper? Let’s check out the page source and see if we can come up with some ideas…
Let’s check out a potential implementation: email_extractor.py.
What is new in this program?
urllib.request.urlopen
: As the name suggests this function opens a URL for
reading much the same way we read from a file. In fact, we can iterate line
by line with a for loop in exactly the same way. One key difference is that
response is raw bytes
, not a string. To obtain a string we need to use the
decode
method. Here we decode assuming the encoding is “utf-8” and we want
to “ignore” errors. Those are reasonable settings for a webpage. This is also
our first nested module, that is we are importing a module nested in another module.write
method to write a string to the
file. Note that unlike print
, write
doesn’t automatically append a
newline and so we need to do so. # Python is rejecting the certificate used for the CS dept. server so we bypass some of those checks
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
Let’s use what we learned about debugging to understand how this code works. Trying step through this code line by line…
How could we adapt this code to take command line arguments instead of the fixed URL and output file? We would add the following code to the bottom of our program.
Inside the if __name__ == "__main__"
conditional we check for the expected
number of command-line arguments. Recall that the first element of sys.argv
is always the name of the file, so if we expect two command-line arguments the
length of sys.argv
should be 3 (the number of arguments + 1). If we don’t
have the correct number of arguments we print a help message for the user
showing the expected command line. If we do have the correct number of
arguments we proceed to invoke the program code using the command-line
arguments we extracted from the sys.argv
list.
def print_usage():
"""Print usage"""
print("python3 email_extractor.py <URL> <OUT_FILE>")
if __name__ == "__main__":
if len(sys.argv) != 3:
print_usage()
else:
url = sys.argv[1]
outfile = sys.argv[2]
emails = get_emails(url)
write_list_to_file(emails, outfile)
Notice that we specified the beginning and ending search strings as optional arguments, that is parameters with default values. Why do so? By using default values are function “just works” for its original intended purpose, find e-mails:
>>> get_emails(COURSE_PAGE)
['mlinderman@middlebury.edu', 'ada@middlebury.edu']
but we can also easily adapt it for other purposes, such as finding go links without needed to change the code. That change makes our function that much more useful!
>>> get_emails(COURSE_PAGE, search_string="go.middlebury.edu/")
['linderman', 'cstutors', 'cs150ab-campuswire']