You may work on this assignment in pairs (encouraged) or by yourself.

In this assignment you will implement and train a neural net to recognize digits from the MNIST database. The assignment uses Python and NumPy. You must use python3, not python 2. Unlike the in-class exercises, this implementation allows networks with arbitrary numbers of layers, but we will not implement regularization. Minimization with be done using stochastic gradient descent again, but this time over "mini batches" of training examples instead of single examples.

Start by downloading hw3.zip and unzip it.

You will be modifying the file `nnet.py`

. Tests for your program are in `test.py`

. To test your implementation, run:

`python3 test.py`

You will see the number of tests your implementation passes and any problems that arise. Tackle the implementation in the order of the test cases (study those carefully). If your numpy skills are rusty, recall there is a handy guide numpy-for-matlab-users.

Previously, given two subsequent layers with sizes \(j\) and \(k\), we stored all the weights in a matrix \(\Theta\) of size \(k\) x (\(j\)+1). In this implementation we use a matrix \(W\) of size \(k\)x\(j\) for the "regular" weights, and a vector \(b\) of size \(k\)x1 for the bias weights.

The feedforward equation to compute the activation for layer \(l\)+1 from layer \(l\) is now:

\(z^{(l+1)} = W^{(l)} a^{(l)} + b^{(l)}\)

\(a^{(l+1)} = g(z^{(l+1)})\)

The backprogation equations are as follows (where \(\cdot\) means element-wise product):

For the last layer:

\(\delta^{(l)} = (a^{(l)} - y) \cdot g'(z^{(l)})\)

For earlier layers:

\(\delta^{(l)} = (W^{(l)})^T \delta^{(l+1)} \cdot g'(z^{(l)})\)

The gradient of the bias weights \(b^{(l)}\) is simply \(\delta^{(l+1)}\), and the gradient of \(W^{(l)}\) is \(\delta^{(l+1)}(a^{(l)})^T\) as before.

Make sure your code passes all tests before moving on. See the comments in the code for additional guidance.

After your implementation passes all 11 tests (note that the last 3 are initially commented out), try to find hyperparameters that give the best performance on the validation set. This includes number of layers, number of neurons in the hidden layers, learning rate, mini batch size, and epochs. You may also want to try decreasing the learning rate in later epochs. Be sure that your network still passes all the unit tests (e.g., only use a decreasing learning rate if `debug`

is `False`

).

Use the program `run.py`

for these experiments. Make sure you don't change the last line in this program, which prints the validation accuracy.

Submit your network `nnet.py`

and your run script `run.py`

using the **CS 451 HW 3 submission page**. Only **one person per team** should submit. Be sure to have a comment with **both** names at the top of all your files if you work in a team.

Like in the last homework, please submit a working version of your code by 10am so we can have an in-class competition again. The final version is due at 5pm -- you can simply submit again. For resubmissions by a team, please have the **same** person submit the code the second time.

The version you submit for class by 10am needs to complete its training in no more than 1 minute. The version you submit for the final deadline can train up to 5 minutes.