Class 21

Applications: Image analysis

Objectives for today

  • Implement nested loops to perform “2-D” computations
  • Utilize existing helper class for building a program
  • Consider the impacts of data choice as a developer

Images as a 2-D structure: Red shift

We can think of images as a “2-D” structure, i.e., the image has a width and a height and each pixel has a position described by specific row and column. For our purposes, let’s assume that (0, 0) is the top-left corner of the image. We could imagine a 3×3 image as having the following structure. The pixel in the exact middle would have row and column indices of 1, and in this example, a value (190, 177 168). In this example, the values are 3-tuples, representing the red, green and blue (RGB) color components of that pixel (each component is the range [0-225]). An all black pixel would have the value (0, 0, 0), an all white pixel (255, 255, 255), a red pixel (255, 0, 0), etc.

Indices Col
0 1 2
Row 0 (108, 85, 71) (149, 131, 121) (210, 201, 196)
1 (106, 87, 73) (190, 177 168) (220, 215, 211)
2 (103, 87, 74) (208, 199, 190) (223, 219, 216)

Today we will use the linked starter code and its Image class for storing and manipulating images as 2-D structures. This class provides a simplified wrapper around the Pillow Python Imaging Library. To run today’s code you will need to install the Pillow library (as we did for NumPy, etc.). In Thonny select the “Tools -> Manage Packages” menu command. Enter pillow into the search, click “Find package from PyPI” then “Install”.

Image loads an image from a URL and provides methods for getting and setting pixels at specific rows and columns. For example, the following code loads an image from a URL, “red shifts” one pixel by adding 100 to the red-component. The resulting image is saved to a local file named linderman_red.jpg.

img = Image("https://www.cs.middlebury.edu/~mlinderman/courses/cs146/f24/classes/linderman.jpg")
# Retrieve Pixel at row 100, column 120
pix = img.get_pixel(100, 120)
# "red shift" pixel by adding 100 to red component
pix.red += 100  
# Modify image by overwrite pixel at row 100, column 120 with new value
img.set_pixel(100, 120, pix)
img.show()

The result does not look that much different (we just changed one pixel) but if we applied to the same transformation to all the pixels, we should see something like:

Original image “Red-shifted” image

Transforming 2-D structures with nested loops

A common task is to perform an operation with or to each pixel. We will need a loop, and since the number of pixels is known at the start of the loop, a for loop is a natural choice. Performing an operation to each pixel can be implemented by performing an operation for all possible combinations of row and column positions, i.e., (0,0), (0, 1) … (2, 2) in the 3×3 example above. All combinations of multiple sequences, in this case, the row and column indices, is readily implemented with nested loops. For example the following code

ROWS = 3
COLS = 4

for row in range(ROWS):
    for col in range(COLS):
        print("Row:", row, "Col:", col, "Linear index:", row*COLS+col)
Row: 0 Col: 0 Linear index: 0
Row: 0 Col: 1 Linear index: 1
Row: 0 Col: 2 Linear index: 2
Row: 0 Col: 3 Linear index: 3
Row: 1 Col: 0 Linear index: 4
Row: 1 Col: 1 Linear index: 5
Row: 1 Col: 2 Linear index: 6
Row: 1 Col: 3 Linear index: 7
Row: 2 Col: 0 Linear index: 8
Row: 2 Col: 1 Linear index: 9
Row: 2 Col: 2 Linear index: 10
Row: 2 Col: 3 Linear index: 11

Here the outer loop iterates over all rows, while the inner loop iterates over all columns. Since the loops are nested, we only advance to the next row, the next iteration of the outer loop, after we have iterated through all columns (for that row). In the output we should observe all combinations of row and column indices. We also see a common pattern for computing a “linear” index, that is an index if we traversed all the elements in row-order. The last is common way of iterating through a “2-D” structure stored in a list or other “1-D” structure. The loop belows the reverse mapping, i.e., translating from a linear index to the associated rows and columns. The loop below will print the same values as above!

ROWS = 3
COLS = 4

for i in range(ROWS*COLS):
    row = i // COLS
    col = i % COLS
    print("Row:", row, "Col:", col, "Linear index:", i)
Row: 0 Col: 0 Linear index: 0
Row: 0 Col: 1 Linear index: 1
Row: 0 Col: 2 Linear index: 2
Row: 0 Col: 3 Linear index: 3
Row: 1 Col: 0 Linear index: 4
Row: 1 Col: 1 Linear index: 5
Row: 1 Col: 2 Linear index: 6
Row: 1 Col: 3 Linear index: 7
Row: 2 Col: 0 Linear index: 8
Row: 2 Col: 1 Linear index: 9
Row: 2 Col: 2 Linear index: 10
Row: 2 Col: 3 Linear index: 11

Finishing Red shift

Using the nested loop structure above, we can readily “red-shift” the entire image by applying the transformation above to every pixel, not just one. We will use similar nested loops, but the ranges are determined by the height and width of the image (not by constants).

def red_shift(in_file, out_file):
    """Red shift image, saving result to local file

    Args:
        in_file: String with URL to original image
        out_file: String filename with image extension to save modified image
    """
    # Load image from URL
    img = Image(in_file)

    # Iterate over all pixels, shifting red component by 100
    for row in range(img.get_height()):
        for col in range(img.get_width()):
            pix = img.get_pixel(row, col)
            img.set_pixel(row, col, Pixel(pix.red + 100, pix.green, pix.blue))
    
    # Save modified image to local file
    img.save_image(out_file)

Yes! The Pillow library doesn’t directly expose the image as a NumPy array, but does enable us to easily construct such an array, e.g.,

import numpy as np
import PIL.Image

image_array = np.array(PIL.Image.open("linderman.jpg"))
image_array.shape, image_array.dtype
((170, 130, 3), dtype('uint8'))

Here we have loaded the image as a 170×130×3 array of 8-bit unsigned integers (i.e., each pixel is represented as 8 bits or a single byte). The 170×130 represents the size of the image (170 rows, 130 columns). The “3” dimension represents the individual color components, i.e., red, green, and blue. In this context we could thinking of red shift as image_array[i, j, k] += 100 where k is 0 and i and j are all valid row and column indices. We can perform this with NumPy as

image_array[:,:,0] = np.clip(image_array[:,:,0].astype(np.uint32) + 100, 0, 255)

and resulting image is just as we expect. But there is a subtlety, indicated by the use of the clip function. An 8 bit unsigned integer can only represents the numbers 0-255 (i.e., \(2^8 -1\)). If adding 100 would result in a value larger than 255, the result “wraps around” (equivalent to (val + 100) % 256). To prevent that “wrap around” we convert to a 32-bit integer (which has larger range) and then clamp the result values to be within 0-255 (via np.clip). PIL handled this for us previously, but when working with “raw” arrays, we are responsible for these details.

Nesting more loops!

In the “red-shift” example above, we perform a “fixed” transformation for each pixel. But we could imagine transformations that might be variable, i.e., require a loop in some way. For example, considering horizontal “blur”, where each pixel is the average of itself and WINDOW-1 pixels to its right. Specifically if WINDOW was 4, each pixel would be the average of itself and the the 3 pixels immediately to its right. Since WINDOW could change we would want to implement with a loop. The result would look something like:

Original image Image with horizontal blur

To apply that transformation to every pixel, we would nest the loop over WINDOW with the loops shown above, i.e., the loop nest would now have 3 “levels”. For simplicity, we will ignore the right most WINDOW-1 pixels in the image.

WINDOW = 4

def blur(in_file, out_file):
    """Apply horizontal blur filter to image, saving result to local file

    Args:
        in_file: String with URL to original image
        out_file: String filename with image extension to save modified image
    """
    # Load image from URL
    img = Image(in_file)

    for row in range(img.get_height()):
        # We will ignore right-most pixels in the image (where blur filter would extend
        # beyond the image boundary)
        for col in range(img.get_width()-WINDOW+1):
            
            pix = Pixel(0, 0, 0)
            for idx in range(WINDOW):
                blur_pix = img.get_pixel(row, col+idx)
                pix.red += blur_pix.red
                pix.green += blur_pix.green
                pix.blue += blur_pix.blue
            # Pixel values have to be integers so we use floor division
            img.set_pixel(row, col, Pixel(pix.red // WINDOW, pix.green // WINDOW, pix.blue // WINDOW))
    
    # Save modified image to local file
    img.save_image(out_file)

Wrapping up

The linked file has the full implementation, including the average function for face averaging.

To experiment with training machine learning systems, checkout this tutorial for using Teachable Machine, a tool for quickly training machine learning models. Try training a model to differentiate your face and hands1. What happens when you only train with one class (e.g., only your face or hands)? What happens when you create more training images? What happens if you only train with images of one of your hands, but then test with your other hand?

Some additional supplementary reading:

Adapted from Peck E. et al. Ethical Reflection Modules for CS1,