Lecture 12¶

Announcements¶

  • New pairs:

    • Layla and Grayson
    • Finn and Charlie
    • Camiel and Ned
    • Shingo and Jacob
  • Project 2 due tonight! How's it going?

  • Project 3 out tomorrow; we'll cover plane sweep stereo in tomorrow's class

  • You can still respond to my Week 2 feedback email if you haven't already!

  • I think I fixed the github classroom uv issue for projects 2 and 3

    • If you created your repo before the fix, it may apply. You can edit .github/workflows/classroom.yml to add these lines before the - name: Checkout code line:
      - name: Install the latest version of uv
        uses: astral-sh/setup-uv@v7
      
  • Week 2 Checkins today

  • Today's tea: Nepal Imperial Black or Four Seasons Oolong?

Goals¶

  • Be able to implement a basic rectified stereo depth estimation routine.
  • Understand why matching is the hard part of stereo vision.
  • Know the definition and formation of the stereo cost volume.
  • Know why and how to use the normalized cross-correlation cost function for stereo matching.
  • Understand the construction of the intrinsic and extrinsic camera matrices.
In [31]:
# boilerplate setup
%load_ext autoreload
%autoreload 2

%matplotlib inline

import os
import sys

src_path = os.path.abspath("../src")
if (src_path not in sys.path):
    sys.path.insert(0, src_path)

# Library imports
import numpy as np
import imageio.v3 as imageio
import matplotlib.pyplot as plt
import skimage as skim
import cv2

# codebase imports
import util
import filtering
import features
import geometry
The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload

Outline¶

  • Rectified stereo:

    • depth from disparity reduces stereo vision to the correspondence problem

    • assumed a simple case: this is the rectified case where (assumptions)

    • correspondence - sounds familiar, but now it's dense. some metrics:

      • SSD - sum of squared differences
      • SAD - sum of absolute differences
      • CC - cross-correlation: filter the right scanline with the left patch; where product is highest, call it a match; in practice, use NCC instead:
      • NCC - normalized cross-correlation: standardize (subtract mean, divide by std) patches before multiplication to add invariance to photometric changes
    • The cost volume: given a matching cost c:

      for i in rows:
        for j in columns:
          for d in disparities:
            C[i, j, d] = c(img1[i,j], img2[i,j+d])
      

      (note that c will usually look at a patch around img[i,j])

In [32]:
left = util.imread_grayfloat("../data/flowers_left.png")
right = util.imread_grayfloat("../data/flowers_right.png")
In [33]:
util.imshow_gray(np.vstack((left, right)))
No description has been provided for this image
In [34]:
# patch = left[115:125, 205:215]
# patch = left[55:61, 102:107]
patch = left[0:35, 45:80]

util.imshow_gray(patch)
No description has been provided for this image
In [35]:
def fast_filter(img, filter):
    return cv2.filter2D(img, -1, filter)
In [36]:
kernel = patch - patch.mean()
xcorr_out = fast_filter(right-right.mean(), patch-patch.mean())
plt.imshow(np.vstack([right, xcorr_out/xcorr_out.max()]))
Out[36]:
<matplotlib.image.AxesImage at 0x126e433e0>
No description has been provided for this image

Define some cost functions:

In [40]:
import tqdm

def ncc_cost(left_patch, right_patch):
    lp = left_patch - left_patch.mean()
    rp = right_patch - right_patch.mean()
    lp /= lp.std()
    rp /= rp.std()
    return (lp * rp).sum()

def ssd_cost(left_patch, right_patch):
    return np.sum((left_patch - right_patch)**2)

def sad_cost(left_patch, right_patch):
    return np.sum(np.abs(left_patch - right_patch))

Compute the cost volume:

In [48]:
H, W = left.shape
window = 5
hw = window // 2

COST = "NCC"

disparity_img = np.zeros_like(left)

disparity_limit = 60

for i in tqdm.tqdm(range(hw, H-hw)): # for each row
    low_i = i-hw
    high_i = i+hw+1
    for j in range(hw, W-hw): # for each column within a local window
        low_j = j-hw
        high_j = j+hw+1
        
        # extract the left patch:
        left_patch = left[low_i:high_i, low_j:high_j] 
        
        costs = 1e5 * np.ones((disparity_limit*2+1)) #initialize costs
        for d in range(-disparity_limit, disparity_limit+1): # for each possible disparity
            if 0 <= low_j+d and high_j+d <= W:
                
                # extract a right patch at the current disparity
                right_patch = right[low_i:high_i, low_j+d:high_j+d]

                # compute the cost                
                if COST == "NCC":
                    c = -ncc_cost(left_patch, right_patch)
                elif COST == "SSD":
                    c = ssd_cost(left_patch, right_patch)
                elif COST == "SAD":
                    c = sad_cost(left_patch, right_patch)

                costs[d + disparity_limit] = c
        
        # set the disparity to the one with lowest cost
        disparity_img[i,j] = np.argmin(costs) - disparity_limit

plt.imshow(np.vstack([left, disparity_img/disparity_img.max()]))
plt.colorbar()
100%|█████████████████████████████████████████████████████████████| 120/120 [00:54<00:00,  2.20it/s]
Out[48]:
<matplotlib.colorbar.Colorbar at 0x127461160>
No description has been provided for this image
In [49]:
np.set_printoptions(suppress=True) # suppress scientific notation
In [50]:
points = np.array([
    [0, 0, -200],
    [-20, 50, -200],
    [-100, 100, -200]
], dtype=np.float64).T
HW #2¶

Suppose a camera is in canonical pose - that is, COP is at world origin, the optical axis runs along the negative z axis, and the projection plane is oriented with $x$ going right and $y$ going up in image space. The focal length of the camera is 100 (this is measured in pixels), and the height and width of the image are also 100. Find the image coordinates (i.e., pixel coordinates that correspond to indices into a numpy array) of the following three 3D points:

  • Point 1: $[0, 0, 200]$
  • Point 2: $[-20, 50, 200]$
  • Point 3: $[-100, 100, 200]$

Keep in mind that in numpy arrays, the origin is at the top left and $y$ goes down.

HW #3¶

Write down the intrinsics matrix $K$ that maps any point from camera coordinates to image coordinates as above. To check your work, make sure that applying the matrix to the three three points in the previous problem yields the expected result.

In [ ]:
# HW 3: fill in your intrinsics matrix here:
K = np.array([
    [1, 0, 0],
    [0, 1, 0],
    [0, 0, 1]
], dtype=np.float64)

# HW 3: verify that K @ points yields the expected pixel coordiantes
# apply k to points to get points_img
In [ ]:
 

Now let's suppose we moved camera. It's now looking at the same points, but the camera center is at $[200, 0, -200]$ and it's rotated 90 degrees "left" - in terms of world coordinates, it's now facing down the negative $x$ axis, with $-z$ going right and $+y$ going up

HW #4¶
  1. For world Points 1 and 3 above, give the 3D coordinates of these points in the camera's local coordiante system.
HW #5¶
  1. Give a 4x4 frame (i.e., basis plus origin) matrix that describes this camera's pose; the contents of this matrix should be $$ \begin{bmatrix} \mathbf{u} & \mathbf{v} & \mathbf{w} & \mathbf{p} \\ 0 & 0 & 0 & 1\end{bmatrix} $$ where $\mathbf{u},\mathbf{v},\mathbf{w}$ are the (x, y, and z) basis vectors of the camera's local coordinate system, and $\mathbf{p}$​ is its origin.
HW #6¶
  1. The above frame matrix is the "frame-to-canonical" matrix: it converts points represented in the given coordinate frame back into world coordinates. What we want instead for the camera matrix is the opposite: a matrix that transforms world coordinates into camera coordinates. Confirm (using software) that the inverse of the matrix that you gave in #5 correctly transforms world coordinates of Points 1 and 3 into the correct camera coordinates you found in #4.
In [ ]:
# HW 5: write down the camera frame matrix 
cam_center = np.array([200, 0, -200], dtype=np.float64)
frame = np.array([
    [1, 0, 0, 0],
    [0, 1, 0, 0],
    [0, 0, 1, 0],
    [0, 0, 0, 1]
], dtype=np.float64)
frame
In [ ]:
np.linalg.inv(frame)
In [ ]:
# make the 3D points into 3D homogeneous points so they can be translated
points_4d = np.ones((4, 3))
points_4d[:3, :] = points
points_4d
In [ ]:
# HW 6: Apply the inverse frame matrix to get world coordinates to camera coordinates
# apply inv(frame) to points_4d and normalize; store them in pts_cam
HW #7¶
  1. Find (again, using software) the final pixel locations of world points 1 and 3.
In [ ]:
# make the intrinsics 3x4 so it drops the 4th dimension
K4d = np.zeros((3, 4))
K4d[:3,:3] = K
K4d
In [ ]:
# HW7: find the final pixel locations of the points in the transformed camera
# apply K4d to pts_cam and store in pts_img
In [ ]: