CSCI 150 Spring 2020

Lecture 14: Modules

Objectives for today

Create a correctly formatted and documented module
Define and use optional and keyword arguments

Modules

What is a module? A collection of related functions and variables. Why do we have modules? To organize and distribute code in a way that minimizes naming conflicts.

We have all already written modules. Every .py file is a module.

Let’s consider the linked my_module as an example. my_module includes a constant and several functions. After importing my_module we can use those functions like any of those in math or the other modules we have used.

>>> import my_module
>>> my_module.a()
10
>>> my_module.b(10, 15)
25
>>> my_module.c("this is a test")
'tt'
>>> my_module.SOME_CONSTANT
10

What about help?

>>> help(my_module)
Help on module my_module:

NAME
    my_module - Some basic functions to illustrate how modules work

DESCRIPTION
    A more detailed description of the module.

FUNCTIONS
    a()
        Prints out the number 10
    
    b(x, y)
        Returns x plus y
    
    c(some_string)
        Returns the first and last character of some_string

DATA
    SOME_CONSTANT = 10

That multi-line comment at the top of the file is also a docstring:

The NAME and brief description come from the filename and that docstring
DESCRIPTION is any subsequent lines in that docstring
FUNCTIONS enumerates the functions and the description from their docstrings
DATA enumerates any constants

Importing

What happens when I import a module? Python executes the Python file.

So if I add a print statement, e.g., print("Loaded my_module"), to my Python file I should expect that message to print at import.

>>> import my_module
>>>

Why didn’t it print? Python doesn’t re-import modules that are already imported. Why does that behavior make sense? What if multiple modules import that same module, e.g., math? What if two modules import each other?

As a practical matter that means if we change our module we will need to restart the Python console (with the stop sign) or use the explicit reload function:

>>> import importlib
>>> importlib.reload(my_module)
Loaded my_module
<module 'my_module' from 'my_module.py'>

Run vs. Import

When we click the green arrow in Thonny we are “running” our Python programs. We could also have been importing them. When would you want to do one or the other?

Think about our Cryptography lab. We could imagine using our functions in a program that help people securely communicate with each other, or that other programmers might want to use our functions in their own communication systems. For the former we would want to be able encrypt/decrypt when our module is run, for the latter we would to make our code “importable” without actually invoking any of the functions.

Python has a special variable __name__ that can be used to determine whether our module is being run or being imported. When a file is run, Python automatically sets that variable to be “main”. If a file is imported Python sets that variable to be the filename as a string (without the “.py” extension).

We typically use this variable in a conditional at the end of the file that changes the behavior depending on the context. For example:

if __name__ == "__main__":
    print("Running the module")
else:
    print("Importing the module")

In most cases, you will only have the “if” branch, that is you will only be doing something if the program is run.

For example, in our past labs, when we prompted users for input (say for a file to read data from), we would do so only if the program is being run (not imported). Gradescope imports your files so that it can test functions without necessarily simulating all of the user interactions. In the upcoming “Weather Report” lab, you will write Python code that can be either used as a standalone program to obtain the current weather, or as part of a more complex application.

Aside: Where did the `pycache` folder come from?

When we import a module, Python compiles to bytecode in a “.pyc” file. This lower-level representation is more efficient to execute. These files aren’t important for this course, but we want you to be aware of where those files are coming from…

Peer instruction questions (import vs. run) [1] (Section A, Section B)

Optional Parameters

We have used range extensively, and done so with different numbers of parameters.

>>> help(range)
Help on class range in module builtins:

class range(object)
 |  range(stop) -> range object
 |  range(start, stop[, step]) -> range object

This works because Python supports optional arguments, e.g., the optional “step”. How would we implement our own version of range? Consider the following (optional_parameters.py):

def my_range_with_step(start, stop, step):
    """
    Return a range
    
    Args:
        start: inclusive start index
        stop: exclusive stop index
        step: range increment

    Returns: A list of integers
    """
    i = start
    r = []
    
    while i < stop:
        r.append(i)
        i += step
    
    return r

def my_range_with_unitstep(start, stop):
    return my_range_with_step(start, stop, 1)

We could condense these two functions into one, if we could set a default value for step. Optional parameters are those with default values, e.g.,

def my_range(start, stop, step=1):
    """
    Return a range
    
    Args:
        start: inclusive start index
        stop: exclusive stop index
        step: range increment

    Returns: A list of integers
    """
    i = start
    r = []
    
    while i < stop:
        r.append(i)
        i += step
    
    return r

Now we can use the same function for the two different use cases. More generally, optional parameters are useful when there is a sensible default value (e.g., stepping by one), but the caller might want/need to change that value sometimes.

Note that you can also specify parameters by name, which is helpful if there are many optional parameters and you only want to change one or two.

>>> from optional_parameters import my_range
>>> my_range(0, 5, step=2)
[0, 2, 4]
>>> my_range(start=1, stop=5)
[1, 2, 3, 4]
>>> my_range(5, start=0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: my_range() got multiple values for argument 'start'
>>> my_range(start=0, 5)
  File "<stdin>", line 1
SyntaxError: positional argument follows keyword argument

Note there are some limits, keyword arguments must follow positional arguments and you can’t specify the same argument more than once.

>>> help(print)
Help on built-in function print in module builtins:

print(...)
    print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
    
    Prints the values to a stream, or to sys.stdout by default.
    Optional keyword arguments:
    file:  a file-like object (stream); defaults to the current sys.stdout.
    sep:   string inserted between values, default a space.
    end:   string appended after the last value, default a newline.
    flush: whether to forcibly flush the stream.

A common place to use keyword arguments is with print, where you will likely only want to modify one of the many optional arguments, e.g., separator.

>>> print("a", "b", "c")
a b c
>>> print("a", "b", "c", sep=",")
a,b,c

The first is separated by spaces, the latter by commas.

Peer instruction questions (optional arguments) [1] (Section A, Section B)

Problem Solving with Dictionaries: Amino Acid translation

For those interested, this optional section provides a more in-depth example of using dictionaries in problem solving.

Sets of 3 DNA/mRNA nucleotides, termed codons code for the different amino acids that make up proteins. An important step in many genomic analyses is to simulate the synthesis of proteins, i.e., simulate amino acid translation, from different DNA sequences. (For more detail on transcription and translation check out this video). For our purposes, translation initiates at the start codon “ATG” and stops at any of 3 stop codons, “TAA”, “TGA” or “TAG”.

Let’s write a short function, translate (aminoacid.py), that scans a fragment of DNA, provided as a string, returning a list of all possible translated proteins. We need a list because there could be multiple proteins within a DNA sequence, that is, there could be multiple start codons within a DNA sequence, each of which would generate a different protein sequence.

For example:

>>> translate(['ATG', 'ATGCCATGTGAA', 'ATGGCATT'])
['M', 'MPCE', 'MA']

What are the major functional elements we need to solve this problem?

Nested loops to iterate through sequence
Dictionary to translate codons to amino acids

We’ll define a CODONS dictionary in which the keys would be codons and the values would be the corresponding amino acid. What will happen if we try to use an invalid codon as a key to our CODONS dictionary? We will get a key error. This is a good place to use the get method. If we have an invalid codon, we can just add the empty string to our protein sequence.

What are some potential problem inputs we should test?

Lack of stop codon, that is, translation reaches end of string
len(dna) < 3, i.e., less than a single codon
len(DNA) % 3 != 0, i.e., dna has incomplete codons

Summary

Modules
Optional arguments

Links in today’s notes

Supplemental Reading

Practical Programming 6.1-6.2
Think Python 14.9