What is a module? A collection of related functions and variables. Why we do have modules? To organize and distribute code in a way that minimizes naming conflicts.
How many of you have written a module? A trick question… Everyone. Every
.py
file is a module.
Let’s consider the linked my_module as an example. my_module
includes a constant and several functions. After importing my_module
we can
use those functions like any of those in math
or the other modules we have
used.
>>> import my_module
>>> my_module.a()
10
>>> my_module.b(10, 15)
25
>>> my_module.c("this is a test")
'tt'
>>> my_module.SOME_CONSTANT
10
What about help
?
>>> help(my_module)
Help on module my_module:
NAME
my_module - Some basic functions to illustrate how modules work
DESCRIPTION
A more detailed description of the module.
FUNCTIONS
a()
Prints out the number 10
b(x, y)
Returns x plus y
c(some_string)
Returns the first and last character of some_string
DATA
SOME_CONSTANT = 10
That multi-line comment at the top of the file is also a docstring:
What happens when I import
a module? Python executes the Python file.
So if I add a print statement, e.g. print("Loaded my_module")
, to my Python
file I should expect that message to print at import.
>>> import my_module
>>>
Why didn’t it print? Python doesn’t re-import modules that are already imported. Why does that behavior make sense? What if multiple modules import that same module, e.g. math? What if two modules import each other?
As a practical matter that means if we change our module we will need restart the Python console (with the stop sign) or use the explicit reload function:
>>> import importlib
>>> importlib.reload(my_module)
Loaded my_module
<module 'my_module' from 'my_module.py'>
When we click the green arrow in Thonny we are “running” our Python programs. We could also have been importing them. When would you want to do one or the other?
Think about our Cryptography lab. We could imagine using our functions in a program that help people securely communicate with each other, or that other programmers might want to use our functions in their own communication systems. For the former we would want to be able encrypt/decrypt when our module is run, for the latter we would to make our code “importable” without actually invoking any of the functions.
Python has a special variable __name__
that can be used to determine whether
our module is being run or being imported. When file is run, Python
automatically sets that variable to be “main”. If a file is imported Python
sets that variable to be filename as a string (without the “.py” extension).
We typically use this variable in a conditional at the end of the file that changes the behavior depending on the context. For example:
if __name__ == "__main__":
print("Running the module")
else:
print("Importing the module")
In most cases, you will only have the “if” branch, that is you will only be doing something if the program is run.
For example, in our past labs, when we prompted users for input (say for a file to read data from), we would do so only if the program is being run (not imported). Gradescope imports your files so that it can test functions without necessarily simulating all of the user interactions. In the upcoming “Weather Report” lab, you will write Python code that can be either used as a standalone program to obtain the current weather, or as part of a more complex application.
__pycache__
folder come from?When we import a module, Python compiles to bytecode in a “.pyc” file. This lower-level representation is more efficient to execute. These files aren’t important for this class, but I want you to be aware of where those files are coming from…
PI Questions (import vs. run)
In many programming assignments we have solicited input from the user (via the
input
function) to control the execution of program (i.e. what file to read,
etc.). I suspect that during the testing process typing those inputs in each
time gets a little tedious… And that you get the sense that controlling our
programs that way is not really compatible with automation, that is running a
program in an automated way on different inputs. There must be a different
way…
What does the “green arrow” actually do? Notice in Thonny, %Run sys_args.py
.
This is the script plus any command line arguments.
What are command line arguments? Like function arguments/parameters, command line arguments, are values passed to a Python program that will affect its execution. We use function parameters to change the inputs for our function. Could we conceivably want to the do the same thing for a program as a whole? So far we have use the input function to solicit input from the user to control the execution of our programs, say to pick the input data file in our data analysis lab. We could alternatively specify those “inputs” on the command line as command line arguments. Doing so would facilitate controlling our programs in an automated way.
The why of the command line is a much larger question that we won’t fully experience in class. Speaking from my own experience, being able to efficiently use a command line environment (and write programs to be used in that environment) will make you a much more productive and effective at data analysis and other computational tasks.
For example, I was curious about how many lines of code are included in your lecture examples. I wrote the function below to count the non-empty lines in a file, but how can I run this on every file?
def count_lines(filename):
"""
Count non-empty lines in file
Args:
filename: File to examine
Return: Count of non-empty lines
"""
with open(filename, "r") as file:
count = 0
for line in file:
if line.strip() != "":
count +=1
return count
I could manually make a list of all the files, but that is slow and error prone. Instead I would like to solve this problem programmatically. The command line can help us do so. It provides a mechanism for programmatically interacting with your computer, e.g. programmatically accessing directories, files, other programs and more. Specifically counting all the lines in all the example programs can be as simple as the following. Let’s learn how to make that work.
$ python3 line_counter.py *.py
Total lines: 1074
We will use sys_args.py as our working example:
With the Python module sys
(short for “system”) there is a variable argv
that is set to be a list of the command line arguments. The first element of
this list is always the path of the program that is executing.
>>> %Run sys_args.py
Arguments: ['sys_args.py']
0: sys_args.py
If we added command line arguments to the Thonny run command, they would be
appended to the argv
array.
>>> %Run sys_args.py these are some arguments
Arguments: ['sys_args.py', 'these', 'are', 'some', 'arguments']
0: sys_args.py
1: these
2: are
3: some
4: arguments
While we can specify command line arguments in Thonny that is not how this functionality is most useful. Instead, we typically use command line arguments at the command line.
We can invoke Python, specifically python3
from the command line (python
on
Windows). We can open the terminal from within Thonny via “Tools -> Open System
Shell” menu option. Once we have launched the shell we need to navigate the
folder with our Python program (we will learn how shortly…).
$ python3 sys_args.py these are some arguments
Arguments: ['sys_args.py', 'these', 'are', 'some', 'arguments']
0: sys_args.py
1: these
2: are
3: some
4: arguments
python3
is the Python interpreter (python
on Windows), the programs that
actually runs inside the Thonny shell. If we run python3
(python
on
Windows) without any arguments we launch the familiar REPL (invoke the quit()
function to exit, or on OSX Ctrl+d):
$ python3
Python 3.7.9 (v3.7.9:13c94747c7, Aug 15 2020, 01:31:08)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
When we supply a path to a Python script as the first argument, Python runs
that script (just like the “green arrow” in Thonny). Any additional arguments
after the script become the command line arguments to the script (available in
argv
).
Thonny largely insulates us from the notion of working directory, that is where in file system we are executing our program. When we invoke Python in the terminal, we will need to navigate within the terminal to the directory containing our program.
The key commands will use to navigate the terminal are:
Command | Description |
---|---|
ls |
List files |
cd dir |
Change directory to dir |
cd .. |
Change to parent directory (i.e., go up one level of hierarchy) |
cd |
Change to home directory |
pwd |
Print the the path of the current working directory |
more <file> |
Show contents of file one screen full at a time (hit q to exit) |
The Windows equivalent to terminal is cmd (type cmd
into the search bar). The
mapping between commands for navigating within the terminal/shell are:
Linux/OSX | Windows |
---|---|
ls |
dir |
cd |
cd |
cd /home/mlinderman/ |
cd C:\Users\mlinderman |
With these commands we are navigating the same file system and directories you see with your graphical browser, but doing so in a text-based programmatic environment.
For example you will likely need to navigate to the directory that contains your Python script. A protocol to do so:
/Users/mlinderman/cs150/sys_args.py
, the directory is everything up the
last /, i.e. /Users/mlinderman/cs150
.In the terminal at the command prompt, e.g. at the $
, type cd
for
“change directory” then enter the path. For example:
$ cd /Users/mlinderman/cs150/
cd
only works on directories. If you have any spaces in your path, you
will need add quotes around the path so it is interpreted as a single
string (you can use left and right arrows to move in your command to edit
it). For example:
$ cd "/Users/mlinderman/cs150/"
PI Questions (command line)
In our earlier examples usage we use *.py
as a wildcard (or globbing) that
the terminal expands into all files that end in “.py”, i.e. that was equivalent to
$ python3 line_counter.py my_module.py sys_args.py ...
We can write our Python code to process any number of
files provided on the command line. Here we use a for
loop to iterate through all
the files provided on the command line and thus in the sys.argv
list (recall that the first element, at index 0, is always the name of the program that is executing). With
that small amount of code we now have a very useful (and efficient) tool. Check
out the complete implementation.
if __name__ == "__main__":
if len(sys.argv) == 1:
# Check that at least one file is provided on the command line
print("Usage: python line_counter.py <1 or more files>")
else:
count = 0
# Process all of the command line arguments (after the name of the program that is always at index 0)
for filename in sys.argv[1:]:
count += count_lines(filename)
print("Total lines:", count)
Could we have accomplished the same task purely within Python, without using the command line environment? Yes, although the resulting approach would be less flexible. For example, we could use the listdir
function on the os
module to return a list of all the files in the current directory and then filter that list for just those files with names ending in “.py”:
import os
filenames = os.listdir()
count = 0
for filename in filenames:
if filename.endswith(".py"):
count += count_lines(filename)
print("Total lines:", count)
While this code may seem simpler than the approach above, I would argue the opposite. This approach has several assumptions built-in, that we are only interested in files in the current directory and only files ending in “.py”. If we want to look at files in a different directory or with different/multiple file endings we will need to modify our program. In contrast, our approach using the command line works for all those scenarios without any modification. For example,
$ python line_counter.py *.py *.md
counts the lines both Python files and Markdown files (the format I use to write the lecture notes) by expanding both wildcards. In this respect, the command line environment “augments” the capabilities of our Python programs. The combination (in my opinion) is more than the sum of the parts!