CS 452 Homework 3 - Neural network backpropagation

Due: Friday 10/20, at 1pm

You may work on this assignment in pairs or by yourself.

In this assignment you will implement and train a neural net to recognize digits from the MNIST database. The assignment uses Python and NumPy. You must use python3, not python 2. Unlike the in-class exercises, this implementation allows networks with arbitrary numbers of layers, but we will not implement regularization. Minimization with be done using stochastic gradient descent again, but this time over "mini batches" of training examples instead of single examples.

Start by downloading hw3.zip and unzip it.


You will be modifying the file nnet.py. Tests for your program are in test.py. To test your implementation, run:

python3 test.py

You will see the number of tests your implementation passes and any problems that arise. Tackle the implementation in the order of the test cases (study those carefully). If your numpy skills are rusty, recall there is a handy guide numpy-for-matlab-users.

Previously, given two subsequent layers with sizes \(j\) and \(k\), we stored all the weights in a matrix \(\Theta\) of size \(k\) x (\(j\)+1). In this implementation we use a matrix \(W\) of size \(k\)x\(j\) for the "regular" weights, and a vector \(b\) of size \(k\)x1 for the bias weights.

The feedforward equation to compute the activation for layer \(l\)+1 from layer \(l\) is now:

\(z^{(l+1)} = W^{(l)} a^{(l)} + b^{(l)}\)

\(a^{(l+1)} = g(z^{(l+1)})\)

The backprogation equations are as follows (where \(\cdot\) means element-wise product):

For the last layer:

\(\delta^{(l)} = (a^{(l)} - y) \cdot g'(z^{(l)})\)

For earlier layers:

\(\delta^{(l)} = (W^{(l)})^T \delta^{(l+1)} \cdot g'(z^{(l)})\)

The gradient of the bias weights \(b^{(l)}\) is simply \(\delta^{(l+1)}\), and the gradient of \(W^{(l)}\) is \(\delta^{(l+1)}(a^{(l)})^T\) as before.

Make sure your code passes all tests before moving on. See the comments in the code for additional guidance.

Tuning hyperparameters

After your implementation passes all 11 tests (note that the last 3 are initially commented out), try to find hyperparameters that give the best performance on the validation set. This includes number of layers, number of neurons in the hidden layers, learning rate, mini batch size, and epochs. You may also want to try decreasing the learning rate in later epochs. Be sure that your network still passes all the unit tests (e.g., only use a decreasing learning rate if debug is False).

Use the program run.py for these experiments. Make sure you don't change the last line in this program, which prints the validation accuracy.


Submit your network nnet.py and your run script run.py using the CS 451 HW 3 submission page. Only one person per team should submit. Be sure to have a comment with both names at the top of all your files if you work in a team.