Class 19: Modules

Objectives for today

Modules

What is a module? A collection of related functions and variables. Why we do have modules? To organize and distribute code in a way that minimizes naming conflicts.

How many of you have written a module? A trick question… Everyone. Every .py file is a module.

Let’s consider the linked my_module as an example. my_module includes a constant and several functions. After importing my_module we can use those functions like any of those in math or the other modules we have used.

>>> import my_module
>>> my_module.a()
10
>>> my_module.b(10, 15)
25
>>> my_module.c("this is a test")
'tt'
>>> my_module.SOME_CONSTANT
10

What about help?

>>> help(my_module)
Help on module my_module:

NAME
    my_module - Some basic functions to illustrate how modules work

DESCRIPTION
    A more detailed description of the module.

FUNCTIONS
    a()
        Prints out the number 10
    
    b(x, y)
        Returns x plus y
    
    c(some_string)
        Returns the first and last character of some_string

DATA
    SOME_CONSTANT = 10

That multi-line comment at the top of the file is also a docstring:

Importing

What happens when I import a module? Python executes the Python file.

So if I add a print statement, e.g. print("Loaded my_module"), to my Python file I should expect that message to print at import.

>>> import my_module
>>> 

Why didn’t it print? Python doesn’t re-import modules that are already imported. Why does that behavior make sense? What if multiple modules import that same module, e.g. math? What if two modules import each other?

As a practical matter that means if we change our module we will need restart the Python console (with the stop sign) or use the explicit reload function:

>>> import importlib
>>> importlib.reload(my_module)
Loaded my_module
<module 'my_module' from 'my_module.py'>

Run vs. Import

When we click the green arrow in Thonny we are “running” our Python programs. We could also have been importing them. When would you want to do one or the other?

Think about our Cryptography lab. We could imagine using our functions in a program that help people securely communicate with each other, or that other programmers might want to use our functions in their own communication systems. For the former we would want to be able encrypt/decrypt when our module is run, for the latter we would to make our code “importable” without actually invoking any of the functions.

Python has a special variable __name__ that can be used to determine whether our module is being run or being imported. When file is run, Python automatically sets that variable to be “main”. If a file is imported Python sets that variable to be filename as a string (without the “.py” extension).

We typically use this variable in a conditional at the end of the file that changes the behavior depending on the context. For example:

if __name__ == "__main__":
    print("Running the module")
else:
    print("Importing the module")

In most cases, you will only have the “if” branch, that is you will only be doing something if the program is run.

For example, in our past labs, when we prompted users for input (say for a file to read data from), we would do so only if the program is being run (not imported). Gradescope imports your files so that it can test functions without necessarily simulating all of the user interactions. In the upcoming “Weather Report” lab, you will write Python code that can be either used as a standalone program to obtain the current weather, or as part of a more complex application.

Aside: Where did the __pycache__ folder come from?

When we import a module, Python compiles to bytecode in a “.pyc” file. This lower-level representation is more efficient to execute. These files aren’t important for this class, but I want you to be aware of where those files are coming from…

PI Questions (import vs. run)

Command lines (and the Terminal)

In many programming assignments we have solicited input from the user (via the input function) to control the execution of program (i.e. what file to read, etc.). I suspect that during the testing process typing those inputs in each time gets a little tedious… And that you get the sense that controlling our programs that way is not really compatible with automation, that is running a program in an automated way on different inputs. There must be a different way…

What does the “green arrow” actually do? Notice in Thonny, %Run sys_args.py. This is the script plus any command line arguments.

What are command line arguments? Like function arguments/parameters, command line arguments, are values passed to a Python program that will affect its execution. We use function parameters to change the inputs for our function. Could we conceivably want to the do the same thing for a program as a whole? So far we have use the input function to solicit input from the user to control the execution of our programs, say to pick the input data file in our data analysis lab. We could alternatively specify those “inputs” on the command line as command line arguments. Doing so would facilitate controlling our programs in an automated way.

The why of the command line is a much larger question that we won’t fully experience in class. Speaking from my own experience, being able to efficiently use a command line environment (and write programs to be used in that environment) will make you a much more productive and effective at data analysis and other computational tasks.

For example, I was curious about how many lines of code are included in your lecture examples. I wrote the function below to count the non-empty lines in a file, but how can I run this on every file?

def count_lines(filename):
    """
    Count non-empty lines in file

    Args:
        filename: File to examine
    
    Return: Count of non-empty lines
    """
    with open(filename, "r") as file:
        count = 0
        for line in file:
            if line.strip() != "":
                count +=1
        return count

I could manually make a list of all the files, but that is slow and error prone. Instead I would like to solve this problem programmatically. The command line can help us do so. It provides a mechanism for programmatically interacting with your computer, e.g. programmatically accessing directories, files, other programs and more. Specifically counting all the lines in all the example programs can be as simple as the following. Let’s learn how to make that work.

$ python3 line_counter.py *.py
Total lines: 1074

A Command Line Example

We will use sys_args.py as our working example:

With the Python module sys (short for “system”) there is a variable argv that is set to be a list of the command line arguments. The first element of this list is always the path of the program that is executing.

>>> %Run sys_args.py
Arguments: ['sys_args.py']
0: sys_args.py

If we added command line arguments to the Thonny run command, they would be appended to the argv array.

>>> %Run sys_args.py these are some arguments
Arguments: ['sys_args.py', 'these', 'are', 'some', 'arguments']
0: sys_args.py
1: these
2: are
3: some
4: arguments

While we can specify command line arguments in Thonny that is not how this functionality is most useful. Instead, we typically use command line arguments at the command line.

Python at the command line

We can invoke Python, specifically python3 from the command line (python on Windows). We can open the terminal from within Thonny via “Tools -> Open System Shell” menu option. Once we have launched the shell we need to navigate the folder with our Python program (we will learn how shortly…).

$ python3 sys_args.py these are some arguments
Arguments: ['sys_args.py', 'these', 'are', 'some', 'arguments']
0: sys_args.py
1: these
2: are
3: some
4: arguments

python3 is the Python interpreter (python on Windows), the programs that actually runs inside the Thonny shell. If we run python3 (python on Windows) without any arguments we launch the familiar REPL (invoke the quit() function to exit, or on OSX Ctrl+d):

$ python3
Python 3.7.9 (v3.7.9:13c94747c7, Aug 15 2020, 01:31:08) 
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 

When we supply a path to a Python script as the first argument, Python runs that script (just like the “green arrow” in Thonny). Any additional arguments after the script become the command line arguments to the script (available in argv).

Thonny largely insulates us from the notion of working directory, that is where in file system we are executing our program. When we invoke Python in the terminal, we will need to navigate within the terminal to the directory containing our program.

The key commands will use to navigate the terminal are:

Command Description
ls List files
cd dir Change directory to dir
cd .. Change to parent directory (i.e., go up one level of hierarchy)
cd Change to home directory
pwd Print the the path of the current working directory
more <file> Show contents of file one screen full at a time (hit q to exit)

The Windows equivalent to terminal is cmd (type cmd into the search bar). The mapping between commands for navigating within the terminal/shell are:

Linux/OSX Windows
ls dir
cd cd
cd /home/mlinderman/ cd C:\Users\mlinderman

With these commands we are navigating the same file system and directories you see with your graphical browser, but doing so in a text-based programmatic environment.

For example you will likely need to navigate to the directory that contains your Python script. A protocol to do so:

  1. Find the directory containing your Python program (it is displayed in the title bar of the Thonny window). For the file /Users/mlinderman/cs150/sys_args.py, the directory is everything up the last /, i.e. /Users/mlinderman/cs150.
  2. In the terminal at the command prompt, e.g. at the $, type cd for “change directory” then enter the path. For example:

     $ cd /Users/mlinderman/cs150/
    

    cd only works on directories. If you have any spaces in your path, you will need add quotes around the path so it is interpreted as a single string (you can use left and right arrows to move in your command to edit it). For example:

     $ cd "/Users/mlinderman/cs150/"
    

PI Questions (command line)

Returning to our line counter

In our earlier examples usage we use *.py as a wildcard (or globbing) that the terminal expands into all files that end in “.py”, i.e. that was equivalent to

$ python3 line_counter.py my_module.py sys_args.py ...

We can write our Python code to process any number of files provided on the command line. Here we use a for loop to iterate through all the files provided on the command line and thus in the sys.argv list (recall that the first element, at index 0, is always the name of the program that is executing). With that small amount of code we now have a very useful (and efficient) tool. Check out the complete implementation.

if __name__ == "__main__":
    if len(sys.argv) == 1:
        # Check that at least one file is provided on the command line
        print("Usage: python line_counter.py <1 or more files>")
    else:
        count = 0
        # Process all of the command line arguments (after the name of the program that is always at index 0)
        for filename in sys.argv[1:]:
            count += count_lines(filename)
        print("Total lines:", count)

Could we have accomplished the same task purely within Python, without using the command line environment? Yes, although the resulting approach would be less flexible. For example, we could use the listdir function on the os module to return a list of all the files in the current directory and then filter that list for just those files with names ending in “.py”:

import os
filenames = os.listdir()

count = 0
for filename in filenames:
    if filename.endswith(".py"):
        count += count_lines(filename)
print("Total lines:", count)

While this code may seem simpler than the approach above, I would argue the opposite. This approach has several assumptions built-in, that we are only interested in files in the current directory and only files ending in “.py”. If we want to look at files in a different directory or with different/multiple file endings we will need to modify our program. In contrast, our approach using the command line works for all those scenarios without any modification. For example,

$ python line_counter.py *.py *.md

counts the lines both Python files and Markdown files (the format I use to write the lecture notes) by expanding both wildcards. In this respect, the command line environment “augments” the capabilities of our Python programs. The combination (in my opinion) is more than the sum of the parts!