Lecture 1 Problems

  1. Rank the following computer vision tasks from “low-level” to “high-level”. There is not necessarily a single right answer, but there are many orderings we should all be able to agree on.

    1. Smoothing out graininess in an image without blurring the edges of objects

    2. For each pixel in a video frame, estimate the location of that pixel’s content in the following frame (i.e., estimate per-pixel motion vectors, AKA optical flow)

    3. Labeling all the cats in a photo

    4. Generating an English language explanation of why an image is funny

    5. Brightening an image

    6. Reconstructing the 3D geometry of an object given photos from multiple perspectives

    7. Segmenting the foreground to create a background blur effect for videoconferencing

  2. A (physical) pinhole camera is simply a box with a hole in it. Describe how the image would change if you made the distance from the pinhole to the back of the box longer or shorter. Assume the other box dimensions stay the same.

  3. Given a 3-channel color image with width width and height height stored in a 3-dimensional array F, write pseudocode to give the image a reddish tint. Assume that F[r, c, i] is the syntax to access the value of the ith color channel (where 0 is red, 1 is green, 2 is blue) of the pixel at the rth row and cth column. Your answer can, but does not need to involve any color space transformations.

  4. Suppose you want to make a color image represented in RGB more saturated, but without allowing any pixel values to go outside the range from 0 to 1. Write pseudocode (or python code) to implement this.

  5. Given a grayscale image \(f(x, y)\), how could you increase the contrast? In other words, how could you make the bright stuff brighter and dark stuff darker? As above, your approach should not allow values to go outside their original range from 0 to 1.

  6. In terms of an input image \(f(x, y)\), write a mathematical expression for a new image \(g\) that is shifted four pixels to the left.

  7. In terms of an input image \(f(x, y)\), write a mathematical expression for a new image \(g\) that is twice as big (i.e., larger by a factor of two in both \(x\) and \(y\)).

  8. For each of Problems 3 through 7, determine whether the transformation described is geometric or photometric.

  9. This problem will give you some hands-on practice with manipulating numpy arrays. Grab the Jupyter notebook from this numpy lab from another class. Drop the .ipynb file into your Jupyter server and open it up. Read the introduction for Part III, then complete Parts 3.0 and 3.1. You may also complete Part 3.2 if you wish, but it is not required 1. In your Markdown homework submission, include the code in a code block followed by the image(s) you produced. You may find that the np.transpose function is helpful for rearranging dimensions into different orders.


  1. If you’re doing 3.2, my advice is: don’t try to do it with a single slicing operation; this is possible, but not particularly natural. My preferred approach involves a loop over columns in the video cube.↩︎