Rank the following computer vision tasks from “low-level” to “high-level”. There is not necessarily a single right answer, but there are many orderings we should all be able to agree on.
Smoothing out graininess in an image without blurring the edges of objects
For each pixel in a video frame, estimate the location of that pixel’s content in the following frame (i.e., estimate per-pixel motion vectors, AKA optical flow)
Labeling all the cats in a photo
Generating an English language explanation of why an image is funny
Brightening an image
Reconstructing the 3D geometry of an object given photos from multiple perspectives
Segmenting the foreground to create a background blur effect for videoconferencing
A (physical) pinhole camera is simply a box with a hole in it. Describe how the image would change if you made the distance from the pinhole to the back of the box longer or shorter. Assume the other box dimensions stay the same.
Given a 3-channel color image with width width and
height height stored in a 3-dimensional array
F, write pseudocode to give the image a reddish tint.
Assume that F[r, c, i] is the syntax to access the value of
the ith color channel (where 0 is red, 1 is green, 2 is
blue) of the pixel at the rth row and cth
column. Your answer can, but does not need to involve any color space
transformations.
Suppose you want to make a color image represented in RGB more saturated, but without allowing any pixel values to go outside the range from 0 to 1. Write pseudocode (or python code) to implement this.
Given a grayscale image \(f(x, y)\), how could you increase the contrast? In other words, how could you make the bright stuff brighter and dark stuff darker? As above, your approach should not allow values to go outside their original range from 0 to 1.
In terms of an input image \(f(x, y)\), write a mathematical expression for a new image \(g\) that is shifted four pixels to the left.
In terms of an input image \(f(x, y)\), write a mathematical expression for a new image \(g\) that is twice as big (i.e., larger by a factor of two in both \(x\) and \(y\)).
For each of Problems 3 through 7, determine whether the transformation described is geometric or photometric.
This problem will give you some hands-on practice with
manipulating numpy arrays. Grab the Jupyter
notebook from this numpy
lab from another class. Drop the .ipynb file into your Jupyter
server and open it up. Read the introduction for Part III, then complete
Parts 3.0 and 3.1. You may also complete Part 3.2 if you wish, but it is
not required 1. In your Markdown homework
submission, include the code in a code block followed by the image(s)
you produced. You may find that the np.transpose function
is helpful for rearranging dimensions into different orders.
If you’re doing 3.2, my advice is: don’t try to do it with a single slicing operation; this is possible, but not particularly natural. My preferred approach involves a loop over columns in the video cube.↩︎