Implement nested loops to perform “2-D” computations
Utilize existing helper class for building a program
Consider the impacts of data choice as a developer
Images as a 2-D structure: Red shift
We can think of images as a “2-D” structure, i.e., the image has a width and a height and each pixel has a position described by specific row and column. For our purposes, let’s assume that (0, 0) is the top-left corner of the image. We could imagine a 3×3 image as having the following structure. The pixel in the exact middle would have row and column indices of 1, and in this example, a value (190, 177 168). In this example, the values are 3-tuples, representing the red, green and blue (RGB) color components of that pixel (each component is the range [0-225]). An all black pixel would have the value (0, 0, 0), an all white pixel (255, 255, 255), a red pixel (255, 0, 0), etc.
Indices
Col
0
1
2
Row
0
(108, 85, 71)
(149, 131, 121)
(210, 201, 196)
1
(106, 87, 73)
(190, 177 168)
(220, 215, 211)
2
(103, 87, 74)
(208, 199, 190)
(223, 219, 216)
Today we will use the linked starter code and its Image class for storing and manipulating images as 2-D structures. This class provides a simplified wrapper around the Pillow Python Imaging Library. To run today’s code you will need to install the Pillow library (as we did for NumPy, etc.). In Thonny select the “Tools -> Manage Packages” menu command. Enter pillow into the search, click “Find package from PyPI” then “Install”.
Image loads an image from a URL and provides methods for getting and setting pixels at specific rows and columns. For example, the following code loads an image from a URL, “red shifts” one pixel by adding 100 to the red-component. The resulting image is saved to a local file named linderman_red.jpg.
img = Image("https://www.cs.middlebury.edu/~mlinderman/courses/cs146/f24/classes/linderman.jpg")# Retrieve Pixel at row 100, column 120pix = img.get_pixel(100, 120)# "red shift" pixel by adding 100 to red componentpix.red +=100# Modify image by overwrite pixel at row 100, column 120 with new valueimg.set_pixel(100, 120, pix)img.show()
The result does not look that much different (we just changed one pixel) but if we applied to the same transformation to all the pixels, we should see something like:
Transforming 2-D structures with nested loops
A common task is to perform an operation with or to each pixel. We will need a loop, and since the number of pixels is known at the start of the loop, a for loop is a natural choice. Performing an operation to each pixel can be implemented by performing an operation for all possible combinations of row and column positions, i.e., (0,0), (0, 1) … (2, 2) in the 3×3 example above. All combinations of multiple sequences, in this case, the row and column indices, is readily implemented with nested loops. For example the following code
Row: 0 Col: 0 Linear index: 0
Row: 0 Col: 1 Linear index: 1
Row: 0 Col: 2 Linear index: 2
Row: 0 Col: 3 Linear index: 3
Row: 1 Col: 0 Linear index: 4
Row: 1 Col: 1 Linear index: 5
Row: 1 Col: 2 Linear index: 6
Row: 1 Col: 3 Linear index: 7
Row: 2 Col: 0 Linear index: 8
Row: 2 Col: 1 Linear index: 9
Row: 2 Col: 2 Linear index: 10
Row: 2 Col: 3 Linear index: 11
Here the outer loop iterates over all rows, while the inner loop iterates over all columns. Since the loops are nested, we only advance to the next row, the next iteration of the outer loop, after we have iterated through all columns (for that row). In the output we should observe all combinations of row and column indices. We also see a common pattern for computing a “linear” index, that is an index if we traversed all the elements in row-order. The last is common way of iterating through a “2-D” structure stored in a list or other “1-D” structure. The loop belows the reverse mapping, i.e., translating from a linear index to the associated rows and columns. The loop below will print the same values as above!
ROWS =3COLS =4for i inrange(ROWS*COLS): row = i // COLS col = i % COLSprint("Row:", row, "Col:", col, "Linear index:", i)
Row: 0 Col: 0 Linear index: 0
Row: 0 Col: 1 Linear index: 1
Row: 0 Col: 2 Linear index: 2
Row: 0 Col: 3 Linear index: 3
Row: 1 Col: 0 Linear index: 4
Row: 1 Col: 1 Linear index: 5
Row: 1 Col: 2 Linear index: 6
Row: 1 Col: 3 Linear index: 7
Row: 2 Col: 0 Linear index: 8
Row: 2 Col: 1 Linear index: 9
Row: 2 Col: 2 Linear index: 10
Row: 2 Col: 3 Linear index: 11
Finishing Red shift
Using the nested loop structure above, we can readily “red-shift” the entire image by applying the transformation above to every pixel, not just one. We will use similar nested loops, but the ranges are determined by the height and width of the image (not by constants).
def red_shift(in_file, out_file):"""Red shift image, saving result to local file Args: in_file: String with URL to original image out_file: String filename with image extension to save modified image """# Load image from URL img = Image(in_file)# Iterate over all pixels, shifting red component by 100for row inrange(img.get_height()):for col inrange(img.get_width()): pix = img.get_pixel(row, col) img.set_pixel(row, col, Pixel(pix.red +100, pix.green, pix.blue))# Save modified image to local file img.save_image(out_file)
This seems like a natural task for a vectorized approach? Could we use NumPy?
Yes! The Pillow library doesn’t directly expose the image as a NumPy array, but does enable us to easily construct such an array, e.g.,
import numpy as npimport PIL.Imageimage_array = np.array(PIL.Image.open("linderman.jpg"))image_array.shape, image_array.dtype
((170, 130, 3), dtype('uint8'))
Here we have loaded the image as a 170×130×3 array of 8-bit unsigned integers (i.e., each pixel is represented as 8 bits or a single byte). The 170×130 represents the size of the image (170 rows, 130 columns). The “3” dimension represents the individual color components, i.e., red, green, and blue. In this context we could thinking of red shift as image_array[i, j, k] += 100 where k is 0 and i and j are all valid row and column indices. We can perform this with NumPy as
and resulting image is just as we expect. But there is a subtlety, indicated by the use of the clip function. An 8 bit unsigned integer can only represents the numbers 0-255 (i.e., \(2^8 -1\)). If adding 100 would result in a value larger than 255, the result “wraps around” (equivalent to (val + 100) % 256). To prevent that “wrap around” we convert to a 32-bit integer (which has larger range) and then clamp the result values to be within 0-255 (via np.clip). PIL handled this for us previously, but when working with “raw” arrays, we are responsible for these details.
Nesting more loops!
In the “red-shift” example above, we perform a “fixed” transformation for each pixel. But we could imagine transformations that might be variable, i.e., require a loop in some way. For example, considering horizontal “blur”, where each pixel is the average of itself and WINDOW-1 pixels to its right. Specifically if WINDOW was 4, each pixel would be the average of itself and the the 3 pixels immediately to its right. Since WINDOW could change we would want to implement with a loop. The result would look something like:
To apply that transformation to every pixel, we would nest the loop over WINDOW with the loops shown above, i.e., the loop nest would now have 3 “levels”. For simplicity, we will ignore the right most WINDOW-1 pixels in the image.
WINDOW =4def blur(in_file, out_file):"""Apply horizontal blur filter to image, saving result to local file Args: in_file: String with URL to original image out_file: String filename with image extension to save modified image """# Load image from URL img = Image(in_file)for row inrange(img.get_height()):# We will ignore right-most pixels in the image (where blur filter would extend# beyond the image boundary)for col inrange(img.get_width()-WINDOW+1): pix = Pixel(0, 0, 0)for idx inrange(WINDOW): blur_pix = img.get_pixel(row, col+idx) pix.red += blur_pix.red pix.green += blur_pix.green pix.blue += blur_pix.blue# Pixel values have to be integers so we use floor division img.set_pixel(row, col, Pixel(pix.red // WINDOW, pix.green // WINDOW, pix.blue // WINDOW))# Save modified image to local file img.save_image(out_file)
Wrapping up
The linked file has the full implementation, including the average function for face averaging.
To experiment with training machine learning systems, checkout this tutorial for using Teachable Machine, a tool for quickly training machine learning models. Try training a model to differentiate your face and hands1. What happens when you only train with one class (e.g., only your face or hands)? What happens when you create more training images? What happens if you only train with images of one of your hands, but then test with your other hand?