
In this assignment, you’ll write a system that automatically stitches a photo sequence into a seamless panoramic image like the one above.
Assigned: Tuesday, January 13th
Code Due: Tuesday, January 20th at 10pm (via Github)
Artifact Due: Wednesday, January 21st at 10pm (via Github)
This assignment will be done individually.
In this project, you will implement a system to combine a series of horizontally overlapping photographs into a single panoramic image. Completion of this project touches on several aspects of the course learning objectives, in particular the ones highlighted in bold below:
To keep things streamlined, we’ll use the built-in ORB feature
detector and descriptor from the opencv library. Given
feature correspondences, you will automatically align the photographs
(determine their overlap and relative positions) using RANSAC to find an
outlier-robust motion model and then blend the resulting images into a
single seamless panorama.
You will implement panorama stitching using translation and homography motion models. The high-level steps required to create a panorama are listed below.
There are two optional extensions you may wish to complete - nominal extra credit will be available for solid implementations of either or both of these:
As in Project 1, skeleton code is provided in a repository created by Github Classroom; the invitation link is found on the Course Hub.
The software environment is also managed again using
uv. To run a Python file (e.g., the GUI), simply run:
uv run gui.py
The first time you run, UV will create a virtual environment and install dependencies; startup should be quick thereafter.
(360 Extension only) Warp each image to spherical coordinates.
warp.pycomputeSphericalWarpMappings[TODO 1 - 360 Extension only] Compute the inverse
map to warp the image by filling in the skeleton code in the
computeSphericalWarpMappings routine to:
Convert the given spherical image coordinates into the corresponding planar image coordinates
Apply radial distortion using the radial distortion model described in lecture
Align neighboring pairs.
alignment.pyalignPair, getInliers,
computeHomography, leastSquaresFitThe computeHomography function takes two feature sets
from image 1 and image 2 (f1 and f2) and a
list of feature matches (containing pairs of indices into
f1 and f2) and estimates a homography from
image 1 to image 2.
[TODO 2] Set up the \(A\) matrix that defines to the system \(Ax\) that computes the residuals for a given homography unrolled into a vector \(h\).
[TODO 3a] Implement minimizeAx to find
the unit-length vector \(\mathbf{x}\)
that minimizes \(||A\mathbf{x}||\) for
a given \(A\).
[TODO 3b] Call minimizeAx on the matrix
you set up in TODO 2 and use its result to fill in the 3x3 homography
matrix \(H\). Don’t forget to return
the homography in its normalized form, with a 1 as the bottom right
entry.
[TODO 4] alignPair is where you will
implement RANSAC. It takes two feature sets, f1 and
f2, the list of feature matches, and a motion
model, m (described below) as parameters. For this
project, we support two motion models, represented by the two possible
values of the enum MotionModel: eTranslate and
eHomography. alignPair estimates and returns
the inter-image transform matrix \(M\)
as follows:
getInliers to get the indices of inlier feature
matches (i.e., indices into matches) that agree with the
current motion estimate.After repeated trials, the entire inlier set from the \(M\) with the largest number of inliers is used to compute a final least squares estimate for the motion, which is returned as the matrix M.
[TODO 5] getInliers computes the
indices of the matches that have a Euclidean distance below
RANSACthresh given features f1
andf2 from image 1 and image 2 and an inter-image
transformation matrix from image 1 to image 2.
[TODO 6, 7] leastSquaresFit computes a
least squares estimate for the translation or homography using all of
the matches previously estimated as inliers. It returns the resulting
translation or homography output transform M. For translation
estimation, I recommend simply averaging the translations rather than
taking the heavy-handed linear algebra approach. For homographies,
you’ve already implemented computeHomography to do the
heavy lifting.
Warp and blend the aligned image pairs into a single output image to create the final panorama.
blend.pyimageBoundingBox, blendImages,
accumulateBlend, normalizeBlend[TODO 8] imageBoundingBox: Given an
image and a homography, figure out the box bounding the image after
applying the homography.
[TODO 9] getAccSize: Given the warped
images and their relative displacements, figure out how large the final
stitched image needs to be in order to fit all the warped image. This
method also augments each per-image transformation with a translation
that moves the output image coordinate system into a
numpy-array-friendly world where (0, 0) is at the top left.
[TODO 10] blendImages: Warp each image
into the output image’s coordinate system and add its pixel content into
the accumulator. You will need to use inverse warping to calculate
values at integer output pixel coordinates. To allow the images to blend
smoothly, use the fourth channel to represent the weight of the
contribution of a pixel. Using the linear blending scheme described in
lecture, the weight varies linearly from 0 to 1 from the left side of
the image over a distance of blendWidth pixels, then ramps
down correspondingly on the right side of the image. Other, fancier
blending schemes are possible - you may experiment with some for extra
credit.
TODO 10 implementation notes:
When working with homogeneous coordinates, don’t forget to normalize when converting them back to Cartesian coordinates.
Watch out for black pixels in the source image when inverse warping, especially when dealing with spherically warped images. You don’t want to include these in the accumulation.
When doing inverse warping, use bilinear interpolation for the
source image pixels. First try to work out the code by looping over each
pixel. Later you can optimize your code using array instructions and
numpy tricks. My approach does vectorized bilinear interpolation using
array operations; another approach uses cv2.remap to warp
the image. In either case, you may find numpy.meshgrid
useful. Optimizing this function is worth only a couple points, so
prioritize this lowest.
[TODO 11] normalizeBlend: Having
accumulated weighted pixels from all the source images, this function
normalizes the image so each pixel has unit weight by dividing by the
weight at each pixel. Be careful not to divide by zero. Remember to make
sure the alpha (fourth) channel of the resulting panorama is opaque
(1)!
[TODO 12 - 360 Extension only]
blendImages: To make a 360 panorama, you need to do a
couple extra things. First, you’ll want to include the first image again
at the end so you can put the seam in the middle of that image. Second,
you’ll need to correct for vertical drift to make the left and right
edges line up perfectly. The getDriftParams function
computes the position of the top left and top right corners of the
un-corrected panorama, accounting for cutting out the left half of the
left image and the right half of the right image. Given these two
points, build a shearing transformation that maps these top two corners
to the same \(y\) value.
[TODO 13 - MOPS Extension only] The base project
uses built-in ORB feature detection and description functionality from
OpenCV.
The GUI (gui.py) accepts a --MOPS flag; if
this is set, the program should use your own custom-written feature
matching pipeline. Implement functionality to detect, desribe, and match
features using Harris, MOPS, and SSD+ratio (methods for this likely fit
best in alignment.py, but I haven’t given you any skeleton
for this). Your pipeline should follow the code we wrote in class, but
should be generalized to multiple scales by running on a Gaussian
pyramid. Feel free to use OpenCV’s pyrDown or import
relevant code from Project 1.
Make appropriate calls to your own feature matching functionality in
gui.py in the computeMapping function to
replace ORB if the--MOPS flag is set.
Some automated unit tests are provided in test.py. These
tests are not comprehensive, but may help you with catching and
debugging math or implementation errors. The test_360.py
file includes some additional tests for the 360 panorama extension.
The skeleton code that we provide comes with a graphical interface,
with the module gui.py, which makes it easy for you to do
the following:
You can use the GUI visualizations to check whether your program is running correctly.
Testing the warping routines:
In the campus test set, the camera parameters used for these examples are:
f = 595 k1 = -0.15 k2 = 0.00
In the yosemite test set, a few example warped images are provided for testing purposes. The camera parameters used for these examples are:
f = 678 k1 = -0.21 k2 = 0.26
See if your program produces the same output. Note that if you use Yosemite with the translation motion model, you might get slightly blurry panoramas in the blending region (as you can also see from the example results). This is because the translation model isn’t flexible enough to describe the true transformation.
Testing the alignment routines:
Note that the campus images are only suitable for the translational
motion model! The yosemite images are suitable for both motion models.
To test alignPair, load two images in the alignment tab of
the GUI. Clicking ‘Align Images’, displays a pair, the left and right
images, with the right image transformed according to the inter-image
transformation matrix and overlaid over the left image. This enables
visually analyzing the accuracy of the transformation matrix. Note that
blending is not performed at this stage.
Testing the blending routines:
When debugging your blending routines, you may find it helpful for
the sake of efficiency to use the melbourne_small dataset, which is
simply a downsampled version of the Melbourne dataset. Example panoramas
are included in the yosemite and the campus directories. Compare the
resulting panorama with these images. Note that it’s important to use
the specified f, k1, k2
parameters to get the same image. If you’re doing the 360 Extension, use
the 360 degree checkbox to ensure you get the same result for campus
dataset.
Additional notes: If you use very high resolution images when creating yoru own panorama on a laptop, you might run into memory problems. Try running on a machine with more memory. 16GB RAM should be enough for panos captured by most consumer-oriented cameras.
Take a series of images from the same position, and stitch a panorama using your code. Because our motion models assume the camera center is in the same place, you’ll want to be careful to only rotate the camera, minimizing any changes in the camera’s position; this is especially important if there is content close to the camera in the frame. This panorama can be either translation-aligned (360 or not, if you implemented 360 features), or aligned with homographies (your choice). For best results, overlap each image by 50% with the previous one, and keep the camera level. In order to use your own camera for a spherically warped translation-aligned panorama, you have to estimate the focal length. The simplest way to do this is through the EXIF tags of the images, as described here. You may also be able to find the focal length (in mm) and sensor width by searching for your camera or phone model. Alternatively, you can use a camera calibration toolkit to get more precise focal length and radial distortion coefficients.
For inspiration, check out some of the following links:
Hours On the first line of
hours.txt, include a single integer estimate of the number
of hours you spent working on this assignment. Below that, you may
optionally include a reflection on how it went, anything you found
particularly confusing or helpful, and/or suggestions for improving the
assignment.
Code Submit your code by committing and pushing
your changes to Github before the deadline. If you did any of the
extensions, describe what you did in a readme.txt.
Artifact Add your panorama artifact, named
artifact.jpg, to the root of your github repository and
push by the artifact deadline. Note that the artifact deadline is one
day later than the code deadline.
Here is a list of ways you might extending the program. Significant
extensions may receive nominal extra credit. You are encouraged to come
up with your own extensions - it’s always fun to see new, unanticipated
ways to use this program! Please use the --extra-credit
flag in gui.py. You will need to use the args parsed in the “main
method” portion of gui.py and modify the rest of the code
as necessary. If I run your program without the flag, it must implement
the base project.
Your project will be graded based on the quality of the panoramas generated. An approximate point breakdown is given below. Keep in mind that later code depends on earlier code, so partial credit may be hard to assign if something early on is broken. If you’re short on time, optimize for having working code for image alignment with homographies.
Correctness:
Efficiency:
Hours:
hours.txt is completed with an estimate of hours spent
on the project.Artifact:
Clarity: Deductions for poor coding style may be made. Please see the syllabus for general coding guidelines. Up to two points may be deducted for each of the following:
Many thanks are due to those who developed and refined prior versions of this assignment, including Steve Seitz, Kavita Bala, Noah Snavely, and many underappreciated TAs.