What is a module? A collection of related functions and variables. Why we do have modules? To organize and distribute code in a way that minimizes naming conflicts.
How many of you have written a module? A trick question… Everyone. Every
.py file is a module.
Let’s consider the linked my_module as an example.
includes a constant and several functions. After importing
my_module we can
use those functions like any of those in
math or the other modules we have
>>> import my_module >>> my_module.a() 10 >>> my_module.b(10, 15) 25 >>> my_module.c("this is a test") 'tt' >>> my_module.SOME_CONSTANT 10
>>> help(my_module) Help on module my_module: NAME my_module - Some basic functions to illustrate how modules work DESCRIPTION A more detailed description of the module. FUNCTIONS a() Prints out the number 10 b(x, y) Returns x plus y c(some_string) Returns the first and last character of some_string DATA SOME_CONSTANT = 10
That multi-line comment at the top of the file is also a docstring:
What happens when I
import a module? Python executes the Python file.
So if I add a print statement, e.g.
print("Loaded my_module"), to my Python
file I should expect that message to print at import.
>>> import my_module >>>
Why didn’t it print? Python doesn’t re-import modules that are already imported. Why does that behavior make sense? What if multiple modules import that same module, e.g. math? What if two modules import each other?
As a practical matter that means if we change our module we will need restart the Python console (with the stop sign) or use the explicit reload function:
>>> import importlib >>> importlib.reload(my_module) Loaded my_module <module 'my_module' from 'my_module.py'>
When we click the green arrow in Thonny we are “running” our Python programs. We could also have been importing them. When would you want to do one or the other?
Think about our Cryptography lab. We could imagine using our functions in a program that help people securely communicate with each other, or that other programmers might want to use our functions in their own communication systems. For the former we would want to be able encrypt/decrypt when our module is run, for the latter we would to make our code “importable” without actually invoking any of the functions.
Python has a special variable
__name__ that can be used to determine whether
our module is being run or being imported. When file is run, Python
automatically sets that variable to be “main”. If a file is imported Python
sets that variable to be filename as a string (without the “.py” extension).
We typically use this variable in a conditional at the end of the file that changes the behavior depending on the context. For example:
if __name__ == "__main__": print("Running the module") else: print("Importing the module")
In most cases, you will only have the “if” branch, that is you will only be doing something if the program is run.
For example, in our past labs, when we prompted users for input (say for a file to read data from), we would do so only if the program is being run (not imported). Gradescope imports your files so that it can test functions without necessarily simulating all of the user interactions. In the upcoming “Weather Report” lab, you will write Python code that can be either used as a standalone program to obtain the current weather, or as part of a more complex application.
__pycache__folder come from?
When we import a module, Python compiles to bytecode in a “.pyc” file. This lower-level representation is more efficient to execute. These files aren’t important for this class, but I want you to be aware of where those files are coming from…
PI Questions (import vs. run)
In many programming assignments we have solicited input from the user (via the
input function) to control the execution of program (i.e. what file to read,
etc.). I suspect that during the testing process typing those inputs in each
time gets a little tedious… And that you get the sense that controlling our
programs that way is not really compatible with automation, that is running a
program in an automated way on different inputs. There must be a different
What does the “green arrow” actually do? Notice in Thonny,
This is the script plus any command line arguments.
What are command line arguments? Like function arguments/parameters, command line arguments, are values passed to a Python program that will affect its execution. We use function parameters to change the inputs for our function. Could we conceivably want to the do the same thing for a program as a whole? So far we have use the input function to solicit input from the user to control the execution of our programs, say to pick the input data file in our data analysis lab. We could alternatively specify those “inputs” on the command line as command line arguments. Doing so would facilitate controlling our programs in an automated way.
The why of the command line is a much larger question that we won’t fully experience in class. Speaking from my own experience, being able to efficiently use a command line environment (and write programs to be used in that environment) will make you a much more productive and effective at data analysis and other computational tasks.
For example, I was curious about how many lines of code are included in your lecture examples. I wrote the function below to count the non-empty lines in a file, but how can I run this on every file?
def count_lines(filename): """ Count non-empty lines in file Args: filename: File to examine Return: Count of non-empty lines """ with open(filename, "r") as file: count = 0 for line in file: if line.strip() != "": count +=1 return count
I could manually make a list of all the files, but that is slow and error prone. Instead I would like to solve this problem programmatically. The command line can help us do so. It provides a mechanism for programmatically interacting with your computer, e.g. programmatically accessing directories, files, other programs and more. Specifically counting all the lines in all the example programs can be as simple as the following. Let’s learn how to make that work.
$ python3 line_counter.py *.py Total lines: 1074
We will use sys_args.py as our working example:
With the Python module
sys (short for “system”) there is a variable
that is set to be a list of the command line arguments. The first element of
this list is always the path of the program that is executing.
>>> %Run sys_args.py Arguments: ['sys_args.py'] 0: sys_args.py
If we added command line arguments to the Thonny run command, they would be
appended to the
>>> %Run sys_args.py these are some arguments Arguments: ['sys_args.py', 'these', 'are', 'some', 'arguments'] 0: sys_args.py 1: these 2: are 3: some 4: arguments
While we can specify command line arguments in Thonny that is not how this functionality is most useful. Instead, we typically use command line arguments at the command line.
We can invoke Python, specifically
python3 from the command line (
Windows). We can open the terminal from within Thonny via “Tools -> Open System
Shell” menu option. Once we have launched the shell we need to navigate the
folder with our Python program (we will learn how shortly…).
$ python3 sys_args.py these are some arguments Arguments: ['sys_args.py', 'these', 'are', 'some', 'arguments'] 0: sys_args.py 1: these 2: are 3: some 4: arguments
python3 is the Python interpreter (
python on Windows), the programs that
actually runs inside the Thonny shell. If we run
Windows) without any arguments we launch the familiar REPL (invoke the
function to exit, or on OSX Ctrl+d):
$ python3 Python 3.7.9 (v3.7.9:13c94747c7, Aug 15 2020, 01:31:08) [Clang 6.0 (clang-600.0.57)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>>
When we supply a path to a Python script as the first argument, Python runs
that script (just like the “green arrow” in Thonny). Any additional arguments
after the script become the command line arguments to the script (available in
Thonny largely insulates us from the notion of working directory, that is where in file system we are executing our program. When we invoke Python in the terminal, we will need to navigate within the terminal to the directory containing our program.
The key commands will use to navigate the terminal are:
||Change directory to
||Change to parent directory (i.e., go up one level of hierarchy)|
||Change to home directory|
||Print the the path of the current working directory|
||Show contents of file one screen full at a time (hit q to exit)|
The Windows equivalent to terminal is cmd (type
cmd into the search bar). The
mapping between commands for navigating within the terminal/shell are:
With these commands we are navigating the same file system and directories you see with your graphical browser, but doing so in a text-based programmatic environment.
For example you will likely need to navigate to the directory that contains your Python script. A protocol to do so:
/Users/mlinderman/cs150/sys_args.py, the directory is everything up the last /, i.e.
In the terminal at the command prompt, e.g. at the
“change directory” then enter the path. For example:
$ cd /Users/mlinderman/cs150/
cd only works on directories. If you have any spaces in your path, you
will need add quotes around the path so it is interpreted as a single
string (you can use left and right arrows to move in your command to edit
it). For example:
$ cd "/Users/mlinderman/cs150/"
PI Questions (command line)
In our earlier examples usage we use
*.py as a wildcard (or globbing) that
the terminal expands into all files that end in “.py”, i.e. that was equivalent to
$ python3 line_counter.py my_module.py sys_args.py ...
We can write our Python code to process any number of
files provided on the command line. Here we use a
for loop to iterate through all
the files provided on the command line and thus in the
sys.argv list (recall that the first element, at index 0, is always the name of the program that is executing). With
that small amount of code we now have a very useful (and efficient) tool. Check
out the complete implementation.
if __name__ == "__main__": if len(sys.argv) == 1: # Check that at least one file is provided on the command line print("Usage: python line_counter.py <1 or more files>") else: count = 0 # Process all of the command line arguments (after the name of the program that is always at index 0) for filename in sys.argv[1:]: count += count_lines(filename) print("Total lines:", count)
Could we have accomplished the same task purely within Python, without using the command line environment? Yes, although the resulting approach would be less flexible. For example, we could use the
listdir function on the
os module to return a list of all the files in the current directory and then filter that list for just those files with names ending in “.py”:
import os filenames = os.listdir() count = 0 for filename in filenames: if filename.endswith(".py"): count += count_lines(filename) print("Total lines:", count)
While this code may seem simpler than the approach above, I would argue the opposite. This approach has several assumptions built-in, that we are only interested in files in the current directory and only files ending in “.py”. If we want to look at files in a different directory or with different/multiple file endings we will need to modify our program. In contrast, our approach using the command line works for all those scenarios without any modification. For example,
$ python line_counter.py *.py *.md
counts the lines both Python files and Markdown files (the format I use to write the lecture notes) by expanding both wildcards. In this respect, the command line environment “augments” the capabilities of our Python programs. The combination (in my opinion) is more than the sum of the parts!