Lecture 15¶
Announcements¶
- Groups for this week:
- Ned, Jacob, Finn
- Layla, Charlie, Camiel
- Grayson, Shingo
- Today's tea: Yunnan Gold
Goals¶
- Be able to describe several different ways 3D reconstructions can be represented (mesh, point cloud, signed distance field, voxels, continuous volumetric representations)
- Explain why positional encodings are necessary for MLP neural networks in contexts like fitting 2D images or 3D scene representations.
- Be prepared to implement NeRF (Project 4).
- Know how to perform volume rendering along camera rays to predict a pixel color from the model
3D Representations¶
- SfM: given images, get camera pose and (sparse) 3D scene geometry
- Large-scale SfM result examples:
- Multiview stereo / 3D reconstruction: given SfM outputs, recover 3D model of the world
- Interesting question: how do you represent your 3D model?
Brainstorm¶
How would you reconstruct a 3D model of the world, given images, camera poses, and a sparse point cloud of the world?
How would you even represent a 3D model of the world?
These questions are clearly interrelated. Let's Brainstorm:
- Triangle mesh
- Denser point cloud
- Voxels storing
- color?
- which sides are opaque (occupancy)
- density
- continuous volumetric representation
# boilerplate setup
%load_ext autoreload
%autoreload 2
%matplotlib inline
import os
import sys
src_path = os.path.abspath("../src")
if (src_path not in sys.path):
sys.path.insert(0, src_path)
# Library imports
import numpy as np
import imageio.v3 as imageio
import matplotlib.pyplot as plt
import skimage as skim
import cv2
import torch
import torch.nn as nn
import torch.nn.functional as F
# codebase imports
import util
import filtering
import features
import geometry
import ML
# Check if GPU is available
if torch.cuda.is_available():
device = torch.device('cuda') # nvidia/cuda
elif torch.mps.is_available():
device = torch.device('mps') # apple
else:
device = torch.device('cpu') # no acceleration
print(f'Using device: {device}')
The autoreload extension is already loaded. To reload it, use: %reload_ext autoreload Using device: mps
3D Representations - Some ideas¶
Source for some relevant visuals: https://courses.cs.washington.edu/courses/cse455/10wi/lectures/multiview.pdf
- Depth maps; multi-camera: depth map fusion
- Voxel grids
- Point clouds
- Patch clouds (surfels)
- Polygon mesh
- SDF
- Neural network!?
Today: Neural Radiance Fields, and other "Learned" 3D Scene Representations¶
ML review: 0 to MLP¶
Review by example the anatomy, care, and feeding of an MLP
import sklearn
import sklearn.datasets
moons = ML.scale_split(sklearn.datasets.make_moons(n_samples=1000, noise=0.1, random_state=0))
X, Xva, y, yva, xx, yy = moons
ML.plot_dataset(X, y)
Code tour:
- look at
ML.MLP() - look at training routine below
- flag me down if something's not familiar!
def train(model, X, y, train_iters=1000):
optimizer = torch.optim.Adam(model.parameters())
for i in range(train_iters):
optimizer.zero_grad()
batch_indices = torch.randint(0, X.shape[0], (1000,))
batch_X, batch_y = X[batch_indices,:], y[batch_indices]
outputs = model(batch_X).squeeze()
loss = F.mse_loss(batch_y, outputs)
loss.backward()
optimizer.step()
return model
def plot_trained_model(model, X, y, xx, yy, encode=lambda x: x):
with torch.no_grad():
h, w = xx.shape
dense_X = encode(torch.vstack([xx.flatten(), yy.flatten()]).T)
dense_ypred = model(dense_X).reshape((h, w)).flip([0])
plt.gca().imshow(dense_ypred, extent=[xx.min(), xx.max(), yy.min(),yy.max()])
ML.plot_dataset(X, y)
model = ML.MLP(2, 1)
model = train(model, X, y, train_iters=1000)
plot_trained_model(model, X, y, xx, yy)
X, y = ML.make_stripes(500, 4, 0.01)
ML.plot_dataset(X, y)
X, y = ML.make_stripes(500, 10, 0.00)
xx, yy = np.meshgrid(np.arange(0, 1, 0.01), np.arange(0, 1, 0.01))
xx = torch.Tensor(xx)
yy = torch.Tensor(yy)
model = train(ML.MLP(2,1), X, y)
plot_trained_model(model, X, y, xx, yy)
Conclusion: high-frequency stuff is hard for the MLP to learn!
Question: We need to go deeper; will more layers fix this?
Try MLP_N
X, y = ML.make_stripes(5000, 20, 0.0)
xx, yy = np.meshgrid(np.arange(0, 1, 0.01), np.arange(0, 1, 0.01))
xx = torch.Tensor(xx)
yy = torch.Tensor(yy)
model = train(ML.MLP_N(2, 6, 128, 1), X, y, 2000)
plot_trained_model(model, X, y, xx, yy)
sum([t.numel() for t in model.parameters()])
66561
To a point, but it's going to get expensive...
Alternative: "positional encoding"
Very handwavy intuition: "smear" the input signal across more input channels to allow the network to learn high-frequency stuff.
pi = torch.pi
X, y = ML.make_stripes(1000, 20, 0.0)
# very barebones positional encoding:
def positional_encoding(X):
return torch.hstack([
torch.sin(2*pi * X),
torch.sin(4*pi * X),
torch.sin(8*pi * X),
torch.sin(16*pi * X),
torch.cos(2*pi * X),
torch.cos(4*pi * X),
torch.cos(8*pi * X),
torch.cos(16*pi * X)])
Xpe = positional_encoding(X)
model = train(ML.MLP(Xpe.shape[1], 1), Xpe, y)
plot_trained_model(model, X, y, xx, yy, encode=positional_encoding)
Here's a paper that does some deeper analysis on this - has some helpful visuals:
Neural Radiance Fields¶
Paper with helpful visuals: https://arxiv.org/pdf/2003.08934.pdf
Representation: continuous volume with color and density¶
- Basic idea: Parameterize a volumetric representation with an MLP

Color and density are a function of 3D location and view direction:
$$ f(x, y, z, \phi, \theta) = (r, g, b, \sigma) $$
- Detail: density is constrained to depend only on location, not direction.
MLP Architecture:¶

Volume Rendering¶
Given a magic color-density-producing machine, how do you make an image?
(notes, and the above architecture overview figure)
If you want to read more background on where this comes from, checkout https://www.scratchapixel.com/lessons/3d-basic-rendering/volume-rendering-for-developers/volume-rendering-summary-equations.html
HW Problem 2¶
The somewhat obfusque equation for weighting samples along a volume rendering ray is: $$ C(𝐫)=\int_{t_n}^{t_f}T(t)\sigma(𝐫(t))𝐜(𝐫(t),𝐝)dt $$ in its continuous form, and the discretized quadriture equation is: $$ \begin{align*} \hat{C}(𝐫) &= \sum_{i=1}^N w_i 𝐜_i \\ &=\sum_{i=1}^{N}T_i(1-\text{exp}(-\sigma_iδ_i))𝐜_i \end{align*} $$ where N is the number of samples, $T_i=\text{exp}(-\sum_{j=1}^{i-1}\sigma_iδ_i)$, and $δ_i=t_{i+1}-t_i$ is the distance between adjacent samples. This boils down to a weighted sum of the colors ($\mathbf{c}_i$) along the sample ray.
To get some intuition for this, let's plug in a simple case and plot the weights. Let's take samples at $t = 1..10$ and assume that the density is 0 except for an constant-density object with density $\sigma= 0.4$ ranging between $t=4$ and $t=6$ inclusive. Using software of your choice, plug this situation into the above equation to compute the weights $w_{1..10}$, and plot these to show the weights on the 10 different sample points.
Positional Encoding¶
High-frequencies aren't learned well by the naive implementation, so use positional encoding
$$ γ(p)=(\text{sin}(2^0πp), \text{cos}(2^0πp), \text{sin}(2^1πp), \text{cos}(2^1πp), \cdots, \text{sin}(2^{L-2}πp), \text{cos}(2^{L-2}πp), \text{sin}(2^{L-1}πp), \text{cos}(2^{L-1}πp)) $$
NeRF Extensions:¶
- Unbounded and higher quality - Mip-Nerf360 Unbounded Anti-Aliased Neural Radiance Fieldshttps://jonbarron.info/mipnerf360/
- Deformable - TiNeuVox: Fast Dynamic Radiance Fields with Time-Aware Neural Voxels https://jaminfong.cn/tineuvox/
- Shape and lighting: https://xiuming.info/projects/nerfactor/
- Editable: https://zju3dv.github.io/sine/
- and so on...
More Recently: Gaussian Splats¶
Look ma, no MLP!
Just a giant cloud of 3D Gaussians that are "learned" (optimized) to minimize reprojection error!
Shiny results: https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/
Paper with some helpful visuals: https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/3d_gaussian_splatting_low.pdf
Video with visualizations that help explain the representation and the learning process: https://www.youtube.com/watch?v=T_kXY43VZnk&t=61s
More Classical methods:¶
- Poisson surface reconstruction (2006) https://doc.cgal.org/latest/Poisson_surface_reconstruction_3/index.html
- Patch-based multi-view stereo (2007) (remained a workhorse through circa 2015+) https://www.di.ens.fr/pmvs/pmvs-1/index.html
- DeepSDF (2019) - neural signed distance functions; ~direct predecessor to NeRF: https://openaccess.thecvf.com/content_CVPR_2019/papers/Park_DeepSDF_Learning_Continuous_Signed_Distance_Functions_for_Shape_Representation_CVPR_2019_paper.pdf