CS 1053 Project 3: Stereo

Scott Wehrwein

Winter 2026

In this assignment, you’ll implement the two-view plane sweep stereo algorithm. Given two calibrated images of the same scene taken from different viewpoints, your task is to recover a rough depth map.

Dates

Assigned: Wednesday, January 21st, 2026

Deadline: Monday, January 26th, 2026

Overview

Purpose

In this project, you’ll get experience working in 3D, recovering real-world scene geometry from 2D images taken by cameras. The project will allow you to demonstrate achievement of the second and third course outcomes:

Basic understanding single-view projective geometry and image formation under the pinhole camera model
Basic understanding of two-view epipolar geometry and its use in recovering 3D structure of a scene.

Setup

As usual, skeleton code is provided in a repository created by Github Classroom; the invitation link is found on the Course Hub. Also as usual, the software environment is managed by uv.

To view the results of turning your depth maps into 3D meshes, you can use a mesh viewer such as Meshlab.

Data

Middlebury has the proud distinction of being world-famous in the computer vision community for the Middlebury Stereo datasets, which were created by Professor Emeritus Daniel Scharstein. The datasets we’ll use here were created by Daniel and his students between 2011-2014.

The input data is large enough that it was inadvisable to include it in the github repository. Go into the data directory and run download.sh to download the required datasets (tentacle is hosted by me, while the remaining datasets are downloaded from the Middlebury Stereo page). You can also uncomment other lines to download and try out additional datasets if you’d like.

Alternatively, you can also download these datasets in a web browser and extract them into the input directory (for tentacle) or data (for all others). Here’s the direct link to the listing of Middlebury dataset zip files: https://vision.middlebury.edu/stereo/data/scenes2014/zip/.

Preview

When finished, you’ll be able to run

uv run plane_sweep_stereo.py <dataset>

where dataset is one of ('tentacle', 'Adirondack', 'Backpack', 'Bicycle1', 'Cable', 'Classroom1', 'Couch', 'Flowers', 'Jadeplant', 'Mask', 'Motorcycle', 'Piano', 'Pipes', 'Playroom', 'Playtable', 'Recycle', 'Shelves', 'Shopvac', 'Sticks', 'Storage', 'Sword1', 'Sword2', 'Umbrella', 'Vintage'). Keep in mind except for tentacle and Flowers, you’ll need to modify data/download.sh to download any other datasets before running your code on them.

For example, if you use the tentacle dataset

uv run plane_sweep_stereo.py tentacle

the output will be in output/tentacle_{ncc.png,ncc.gif,depth.npy,projected.gif}.

The following illustrates the two input views for the tentacle dataset:

The outputs of the plane-sweep stereo for the tentacle dataset should look like this:

The first animated gif is tentacle_projected.gif, which shows each rendering of the scene as a planar proxy is swept away from the camera.

For this project, we use Normalized Cross Correlation (NCC) measure for matching scores. The second animated gif is tentacle_ncc.gif, which shows slices of the NCC cost volume where each frame corresponds to a single depth. White is high NCC and black is low NCC.

The last image shows the correct depth output tentacle_ncc.png for the tentacle dataset, which is computed from the argmax depth according to the NCC cost volume. White is near and black is far.

Tasks

Most of the code you will implement is in student.py, with the exception of the last task, which is to complete the main body of the planesweep loop in plane_sweep_stereo.py. It’s recommeded that you start by taking a look through the well-commented plane_sweep_stereo.py to get an idea of where these functions fit in. The functions to be implemented have detailed specifications - see those for details of what you need to do.

Implement project_impl. This projects 3D points into a camera given its extrinsic and intrinsic calibration matrices.
Implement unproject_corners_impl. This un-projects the corners of an image out into the scene to a distance depth from the camera and returns their world coordinates.
Complete the implementation of preprocess_ncc_impl. This prepares an image for NCC by building an array of size h x w x c * ncc_size * ncc_size, where the final dimension contains the normalized RGB values of all pixels in a c * ncc_size * ncc_size patch, unrolled into a vector. In other words, if the input is I and the output is A, then A[i,j,:] contains the normalized pixel values of the patch centered at I[i,j]. See the method spec for more details.

You have been given vectorized code that extracts the raw pixel values and builds an array of size (h, w, c, ncc_size, ncc_size). This uses a similar approach to what you did in Project 1 to accelerate cross-correlation, where it loops over the patch dimensions and fills in, e.g., the top-left pixel in all patches in one sliced assignment. Your job is to subtract the per-channel patch mean and divide each patch by the whole patch’s (not-per-channel) vector norm.

Potentially helpful features:
- np.mean, and particularly its axis argument
- The reshape method of array objects
- np.linalg.norm (can take an axis)
- Boolean array indexing, as in A[A>4] = A[A>4] + 1
Implement compute_ncc_impl. This takes two images that have been preprocessed as above and returns an h x w image of NCC scores at each pixel.
Fill in each (TODO) line in plane_sweep_stereo.py to complete the overall plane sweep stereo algorithm. This is mostly a matter of making calls to the functions you’ve already implemented.

Testing

You are provided with some test cases in tests.py. Feel free to run these with uv run tests.py to help you with debugging. There are unit tests for all the functions you write, but not for the main program. You can, however, check that your output on tentacle matches the results shown above.

If the code is running slowly while you’re debugging, you can speed things up by downsampling the datasets further, or computing fewer depth layers. In dataset.py, modify:

 self.depth_layers = 128

to change the number of depth hypotheses, or

 self.stereo_downscale_factor = 4

to change the downsampling factor applied to the Middlebury datasets. The output image will be of dimensions (height / 2^stereo_downscale_factor, width / 2^stereo_downscale_factor).

Efficiency

We’ve configured the tentacle dataset such that it takes about 0.5-100 seconds to compute depending on your implementation. Because we’re using opencv to warp compute homographies and warp images, the main bottleneck will likely be preprocess_ncc. Some tips:

The tricky part - collecting patches into a single array - is done for you; the approach is similar to that used in Project 1.
Avoid loops: do the mean subtraction and normalization using array operations.
Try to make your life as easy as possible by strategically reshaping your array in ways that make the current step as easy as possible to write and comprehend.
Make use of the axis keyword argument of numpy functions. For example, you can sum an array along only one axis, or along more than one axis if you provide a tuple.
You can introduce singleton dimensions by slicing and adding np.newaxis: if A.shape == (3, 4) then A[np.newaxis, :, :].shape = (1, 3, 4).

If two array operands have dimensions that match except for a singleton, the singleton dimension will be broadcast to match the non-singleton:

A = np.zeros((5,4,3))
B = np.sum(A, axis=2) # has shape (5, 4)
C = A + B # error, dimension mismatch
C = A + B[:,:,np.newaxis] # B is auto-replicated across the channel dimension

Mesh Reconstruction

There are no tasks for you to complete for this part, but the machinery is there and you’re welcome to try it out. Once you’ve computed depth for a scene, you can create a 3D model out of the depth map as follows:

uv run combine.py <dataset> depth

You can open up output/<dataset>_depth.ply in Meshlab to see your own result for any of the datasets. Here’s the mesh from my tentacle result:

You may need to fiddle with settings to get the colors to show - try un-toggling the triangley-cylinder button two buttons right from the wireframe cube at the top of the screen.

Extra Credit Extensions

The following extensions can be completed for modest (up to 5 points) extra credit. Each extra credit point is exponentially harder to earn.

Direct and incremental homography construction: We’re using corner correspondences to fit homographies, but you can also build them analytically from the camera parameters. On top of this, you can find one initial homography for the first depth and then augment it with a sequence of incremental homographies to sweep through depths; these are called “dilation” homographies. Read up on this in the original planesweep paper and/or Section 12.1.2 of Szeliski and implement this approach, eliminating the need for OpenCV’s computeHomography.
Stereo evaluation: the Middlebury datasets come with ground truth disparity maps - see the webpage for details on the file formats, etc (pfm images should be readable with imageio). You’ll need to handle the conversion between depth and disparity, handle the different image sizes, and decide what metrics to use to measure accuracy. See Section 5 of Scharstein and Szeliski’s 2001 IJCV paper for some ideas on metrics.
Rectified stereo: Implement stereo rectification and compute the cost volume in the traditional order (for each pixel, for each desparity). The datasets come with calibration information that is loaded in by the code in datasets.py - feel free to use this.
Better stereo: find ways to make your stereo algorithm perform better (combining this with (1), you can measure quantiative improvement; otherwise, you can evaluate qualitatively by looking at the results side-by-side). Some ideas include processing the cost volume in some way before doing the argmax, or using a different similarity metric from NCC. Try out some ideas and see if you can get better results. Feel free to look in the literature for inspiration - once again, A Taxonomy and Evaluation of Dense Two-FrameStereo Correspondence Algorithms is a great place to start for a review of what people have done in the (now somewhat distant) past. Trying out ideas is the goal here - you don’t need to show a big improvement to get full credit.
8-point algorithm: Implement the 8-point algorithm to estimate the fundamental matrix (see wikipedia). You don’t need to get into estimating \(R\) and \(t\) (though if you’re feeling ambitious, go ahead!). Find some way to validate your algorithm (careful - this may be the hard part!). Compare this matrix to the “true” fundamental matrix computed from the camera calibration parameters. You may need to give some thought to the best way to compare these matrices - something like SSD on the matrix elements is likely not linearly related to any meaningful notion of geometric similarity. You can use OpenCV’s feature matching pipeline as we did in Project 2.

You can also propose your own extensions - feel free to run your ideas by me.

To get credit for your extension(s), you must:

Write your code in a separate file(s) (but feel free to import functions from the base assignment). The base assignment code should run as specified without modification.
Include a readme.txt, readme.pdf, or readme.html file in your repository’s base directory containing:
a brief description of what enhancement(s) you completed
a brief description of your code layout/architecture any design decisions or assumptions you made
outputs and/or analysis resulting from your extensions. For example, these might include:
- a table with at least a handful of performance metrics calculated for at least a handful of datasets
- depth maps produced by rectified stereo (or your improved stereo algorithm) compared side-by-side with planesweep results
- analysis comparing your estimated fundamental matrices to the “ground truth” ones
instructions for running your code to generate the results in your writeup

Submission

Generate results for the tentacle dataset and the Flowers dataset, and commit them to your repository. You don’t need to submit .ply files.
Hours On the first line of hours.txt, include a single integer estimate of the number of hours you spent working on this assignment. Below that, you may optionally include a reflection on how it went, anything you found particularly confusing or helpful, and/or suggestions for improving the assignment.
Code Submit your code by committing and pushing your changes to Github before the deadline. If you did any of the extensions, describe what you did in a readme.txt.

Rubric

Points are awarded for correctness and efficiency, and deducted for issues with clarity or submission mechanics.

Correctness (45 points)
Unit tests (35 points)	Correctness as determined by `tests.py` (`score = ceil(n_passed*1.5`))
Stereo output (10 points)	Output on `tentacle` and `Flowers`
Efficiency (5 points)
5 points	`python plane_sweep_stereo.py tentacle` runs in under 30 seconds

Clarity Deductions for poor coding style may be made. Please see the syllabus for general coding guidelines. Points may be deducted for any of the following:

Methods should be written as concisely and clearly as possible
Methods should not be too long - use helper methods to break code into sensible subroutines
Code should not be cryptic and terse - explain nontrivial blocks with comments
Methods you introduce should be accompanied by a precise specification
Variable and function names should be informative but not overly verbose

Acknowledgements

This assignment is based on versions developed and refined by Kavita Bala, Noah Snavely, and numerous underappreciated TAs.