{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Convolutional Neural Networks and Image Recognition"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Goals\n",
"\n",
"* Understand the differences between a standard convolution layer in a convolutional neural network and the traditional 2D image processing convolution we've discussed thus far:\n",
" * It's technically cross-correlation instead of convolution (why bother flipping the kernel if the weights are learned?)\n",
" * There are Separate weights per channel\n",
" * Each filter sums across channels, and produces a single channel of the output feature map\n",
"* Know the meaning of a few of the variations possible with convolution layers:\n",
" * Stride (where the window slides by more than 1 pixel at a time)\n",
" * Bias (where a separate scalar parameter is added to the filter's output channel, similar to a bias in a linear layer)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" ## Convolutional Layers and Convolutional Networks: A Quick Primer"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "CJ1zvCUxhz_Y"
},
"source": [
"\n",
"Convolutional Neural Networks (CNNs) are neural networks where some (many) of the layers perform **convolutions** instead of matrix multiplication (as in a linear (aka fully-connected) layer. I'll go through a basic introduction to Convolution layers and Convolutional Neural Networks, \n",
"\n",
"If you want a deeper and more thorough presentation, you can check out the following videos:\n",
"\n",
"* C4W1L06 - Convolutions Over Volumes https://www.youtube.com/watch?v=KTB_OFoAQcc\n",
"* C4W1L07 - One Layer of a Convolutional Net https://www.youtube.com/watch?v=jPOAS7uCODQ\n",
"* C4W1L08 - Simple Convolutional Network Example https://www.youtube.com/watch?v=3PyJA9AfwSk, but don't get bogged down in the notation details.\n",
"* If you want to see a friendly introduction to Max Pooling layers, also check out this one: C4W1L09 Pooling Layers https://www.youtube.com/watch?v=8oOgPUO-TBY\n",
"\n",
"Some non-video animation resources:\n",
"\n",
"* Here are some fun animations: \n",
"* and here are some older-school animations; these reduce things to 2D but might be easier to grok in some cases, and they include \"deconvolution\": "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Todo:\n",
"* Learn you a filter!\n",
"* Cross-correlation instead of convolution\n",
"* Multi-channel convolutions: image processing vs CNN layer\n",
"* Summing across channels\n",
"* Multiple filters\n",
"\n",
"Fancier:\n",
"* Strided convolution\n",
"* Convolution with bias\n",
"\n",
"CNN architectures considerations:\n",
"* Spatial vs channel dimension\n",
"* Downsampling mechanics (max pool, strided, bilinear)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "CJ1zvCUxhz_Y"
},
"source": [
"## CNN Architecture By Example: LeNet on MNIST\n",
"Now, we'll use this notebook to explore the use of CNNs on a \"small\" dataset of handwritten digits. \n",
"\n",
"By the end of this activity, you should be able to:\n",
"* Know some of the typical architectural features of CNNs"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "CJ1zvCUxhz_Y"
},
"source": [
"An early success case for CNNs was a network called LeNet (after Yann **Le**Cun), which performed well on a dataset of handwritten digits called MNIST.\n",
"\n",
"In this notebook you will experiment with different architectures on the MNIST dataset to get a feel for how CNNs work. With modern techniques and compute, this dataset is considered a toy dataset, but it's still an interesting testing ground for architecture ideas. In particular, we're going to look at it through the lens of how many **parameters** we need to learn to do well on the dataset.\n",
"\n",
"### You'll Need a GPU\n",
"\n",
"To train the models in this notebook, you'll want to be on a machine with an NVIDIA GPU. Please see the [Project 4 instructions](https://www.cs.middlebury.edu/~swehrwein/cs1053_26w/p4/#overview) for details on how to do this."
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "XHLKy2Ekqoq4"
},
"source": [
"#### Useful functions\n",
"\n",
"Run the below cell to define functions that will be used later"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
},
"colab_type": "code",
"id": "XPWrOExkqysJ"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The autoreload extension is already loaded. To reload it, use:\n",
" %reload_ext autoreload\n",
"cpu\n"
]
}
],
"source": [
"%load_ext autoreload\n",
"%autoreload 2\n",
"\n",
"import torch\n",
"import torch.nn as nn\n",
"import torch.nn.functional as F\n",
"import torch.optim as optim\n",
"import torchvision\n",
"from torch.autograd import Variable\n",
"\n",
"import urllib\n",
"import cv2\n",
"import numpy as np\n",
"import os, sys, math, random, subprocess\n",
"import matplotlib.pyplot as plt\n",
"from scipy.ndimage import gaussian_filter\n",
"from IPython.display import clear_output, Image, display, HTML\n",
"from io import StringIO\n",
"import PIL.Image\n",
"\n",
"src_path = os.path.abspath(\"../src\")\n",
"if (src_path not in sys.path):\n",
" sys.path.insert(0, src_path)\n",
" \n",
"import ML\n",
"\n",
"def get_n_params(module):\n",
" nparam = 0\n",
" for name, param in module.named_parameters():\n",
" param_count = 1\n",
" for size in list(param.size()):\n",
" param_count *= size\n",
" nparam += param_count\n",
" return nparam\n",
"\n",
"def get_model_params(model):\n",
" nparam = 0\n",
" for name, module in model.named_modules():\n",
" nparam += get_n_params(module)\n",
" return nparam\n",
"\n",
"def to_numpy_image(tensor_or_variable):\n",
" \n",
" # If this is already a numpy image, just return it\n",
" if type(tensor_or_variable) == np.ndarray:\n",
" return tensor_or_variable\n",
" \n",
" # Make sure this is a tensor and not a variable\n",
" if type(tensor_or_variable) == Variable:\n",
" tensor = tensor_or_variable.data\n",
" else:\n",
" tensor = tensor_or_variable\n",
" \n",
" # Convert to numpy and move to CPU if necessary\n",
" np_img = tensor.cpu().numpy()\n",
" \n",
" # If there is no batch dimension, add one\n",
" if len(np_img.shape) == 3:\n",
" np_img = np_img[np.newaxis, ...]\n",
" \n",
" # Convert from BxCxHxW (PyTorch convention) to BxHxWxC (OpenCV/numpy convention)\n",
" np_img = np_img.transpose(0, 2, 3, 1)\n",
" \n",
" return np_img\n",
"\n",
"def normalize_zero_one_range(tensor_like):\n",
" x = tensor_like - tensor_like.min()\n",
" x = x / (x.max() + 1e-9)\n",
" return x\n",
"\n",
"def prep_for_showing(image):\n",
" np_img = to_numpy_image(image)\n",
" if len(np_img.shape) > 3:\n",
" np_img = np_img[0]\n",
" np_img = normalize_zero_one_range(np_img)\n",
" return np_img\n",
"\n",
"def show_image(tensor_var_or_np, title=None, bordercolor=None):\n",
" np_img = prep_for_showing(tensor_var_or_np)\n",
" \n",
" if bordercolor is not None:\n",
" np_img = draw_border(np_img, bordercolor)\n",
" \n",
" # plot it\n",
" np_img = np_img.squeeze()\n",
" plt.figure(figsize=(4,4))\n",
" plt.imshow(np_img)\n",
" plt.axis('off')\n",
" if title: plt.title(title)\n",
" plt.show()\n",
"\n",
"device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
"print(device)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "CJiMTcx9qr5Z"
},
"source": [
"## Training Data\n",
"\n",
"We will use the [MNIST handrwritten digit dataset](https://www.kaggle.com/datasets/hojjatk/mnist-dataset) to train our neural network models. There is a simple wrapper for the MNIST dataset in the torchvision package that implements the Dataset class. We will use that in conjunction with the DataLoader to load training data. Run the below cell to download and initialize our training and test datasets. You should see an example batch of images and their labels shown."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
},
"colab_type": "code",
"id": "hURbcBfwqUrY"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████████████████████████████████████████████████████| 9.91M/9.91M [00:00<00:00, 32.5MB/s]\n",
"100%|██████████████████████████████████████████████████████████| 28.9k/28.9k [00:00<00:00, 1.36MB/s]\n",
"100%|██████████████████████████████████████████████████████████| 1.65M/1.65M [00:00<00:00, 12.1MB/s]\n",
"100%|███████████████████████████████████████████████████████████| 4.54k/4.54k [00:00<00:00, 823kB/s]\n"
]
},
{
"data": {
"image/png": "",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"GroundTruth: 7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1\n"
]
}
],
"source": [
"import torchvision\n",
"import torchvision.transforms as transforms\n",
"\n",
"BATCH_SIZE = 32\n",
"\n",
"transform = transforms.Compose(\n",
" [transforms.ToTensor(),\n",
" transforms.Normalize((0.5,), (0.5,))])\n",
"\n",
"trainset = torchvision.datasets.MNIST(root='../data', train=True,\n",
" download=True, transform=transform)\n",
"trainloader = torch.utils.data.DataLoader(trainset, batch_size=BATCH_SIZE,\n",
" shuffle=True, num_workers=2)\n",
"\n",
"testset = torchvision.datasets.MNIST(root='../data', train=False,\n",
" download=True, transform=transform)\n",
"testloader = torch.utils.data.DataLoader(testset, batch_size=BATCH_SIZE,\n",
" shuffle=False, num_workers=2)\n",
"\n",
"# get some random training images\n",
"dataiter = iter(trainloader)\n",
"images, labels = next(dataiter)\n",
"\n",
"dataiter = iter(testloader)\n",
"images, labels = next(dataiter)\n",
"\n",
"# print images\n",
"show_image(torchvision.utils.make_grid(images))\n",
"print('GroundTruth: ', ' '.join('%5s' % labels[j].item() for j in range(BATCH_SIZE)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Each image has 1 channel (note the channel dimension is before the spatial dimensions), is 28x28 pixels, and our `images` tensor here has a batch of 32 of them:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"torch.Size([32, 1, 28, 28])\n"
]
}
],
"source": [
"print(images.shape)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "BfN6oCjwrqR1"
},
"source": [
"## Example CNN Architecture\n",
"\n",
"Here's a diagram of the LeNet architecture:\n",
"\n",
"\n",
"Below is an example model provided to you. This architecture is similar, though not identical to the original LeNet architecture depicted above. It is pretty good and it reaches >98% accuracy on the test set, although better accuracy is quite possible.\n",
"\n",
"A few important things to notice about this architecture that are typical of CNN architectures:\n",
"* The network begins by alternating between:\n",
" * conv layers, which keep the spatial dimensions mostly the same, except for the few pixels lost to \"valid\" output size\n",
" * Some layer which reduces the spatial resolution.\n",
"* As the spatial dimensions get smaller, the channel (number of filters, or feature map depth) gets larger\n",
"* At some point, we start ignoring the spatial dimensions (conceptually \"unrolling the (h x w x c) feature map into a 1D vector of length (h*w*c), then apply some linear (fully-connected) layers.\n",
"* In this case, the layer that halves the spatial resolution is a 2x2 [max pooling](https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html) layer with stride 2, meaning it takes the max value in every (non-overlapping) 2x2 block of pixels."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
},
"colab_type": "code",
"id": "jjvxIDrXrujd"
},
"outputs": [],
"source": [
"class ExampleModel(nn.Module):\n",
" def __init__(self):\n",
" super(ExampleModel, self).__init__()\n",
" # Convolution. Input channels: 1, output channels: 6, kernel size: 5\n",
" self.conv1 = nn.Conv2d(1, 6, 5)\n",
" # Max-pooling layer that will halve the HxW resolution\n",
" self.pool = nn.MaxPool2d(2, 2)\n",
" # Another 5x5 convolution that brings channel count up to 16\n",
" self.conv2 = nn.Conv2d(6, 16, 5)\n",
" \n",
" # Three fully connected layers\n",
" self.fc1 = nn.Linear(16 * 4 * 4, 60)\n",
" self.fc2 = nn.Linear(60, 40)\n",
" self.fc3 = nn.Linear(40, 10)\n",
"\n",
" def forward(self, x):\n",
" # Apply convolution, activation and pooling\n",
" # Output width after convolution = (input_width - (kernel_size - 1) / 2)\n",
" # Output width after pooling = input_width / 2\n",
" \n",
" # x.size() = Bx1x28x28\n",
" x = self.pool(F.relu(self.conv1(x)))\n",
" # x.size() = Bx6x12x12\n",
" x = self.pool(F.relu(self.conv2(x)))\n",
" # x.size() = Bx16x4x4\n",
" \n",
" # Flatten the output\n",
" x = x.view(-1, 16 * 4 * 4)\n",
" x = F.relu(self.fc1(x))\n",
" x = F.relu(self.fc2(x))\n",
" x = self.fc3(x)\n",
" return x\n",
"\n",
"\n",
"example_cnn = ExampleModel()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Parameter Counting\n",
"\n",
"In this exploration, we're going to pay particular attention to the number of parameters in a model - that is, how many weights do we need to store and learn when training the network?\n",
"\n",
"Here's the count for the `ExampleModel` above:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
},
"colab_type": "code",
"id": "jjvxIDrXrujd"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Model number of parameters: 20842\n"
]
}
],
"source": [
"print(f\"Model number of parameters: {get_n_params(example_cnn)}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And here's the parameter count for our tiny 3-layer MLP from last class:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Model number of parameters: 118282\n"
]
}
],
"source": [
"mlp = ML.MLP(28*28, 10) # 28x28 pixel input, 10-class output\n",
"print(f\"Model number of parameters: {get_n_params(mlp)}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notice: 3-layer MLP has a ton of parameters relative to a CNN with more layers! This is because conv layers learn a small number of weights that are applied sliding-window fashion across the entire input, regardless of its spatial dimensions. In contrast, the MLP has \"densely connected\" weights, where everything in one layer depends on everything in the prior layer.\n",
"\n",
"Let's look at the relative performance of these two models on MNIST:"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "7NBY2H8yrgRE"
},
"source": [
"## Training Loop\n",
"The following function trains a model."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
},
"colab_type": "code",
"id": "EVzqqVy9rfLk"
},
"outputs": [],
"source": [
"PRINT_EVERY = 100\n",
"\n",
"def train_model(net):\n",
" \n",
" criterion = nn.CrossEntropyLoss()\n",
" optimizer = optim.Adam(net.parameters(), lr=0.001)\n",
" \n",
" net.to(device)\n",
" \n",
" net.train() # set the network in \"training mode\"\n",
" \n",
" for epoch in range(10): # loop over the dataset multiple times\n",
" \n",
" running_loss = 0.0\n",
" for i, data in enumerate(trainloader, 0):\n",
" # get the inputs\n",
" inputs, labels = data\n",
" \n",
" inputs = inputs.to(device)\n",
" labels = labels.to(device)\n",
" \n",
" # zero the parameter gradients\n",
" optimizer.zero_grad()\n",
" \n",
" # forward + backward + optimize\n",
" outputs = net(inputs)\n",
" loss = criterion(outputs, labels)\n",
" loss.backward()\n",
" optimizer.step()\n",
" \n",
" # print statistics\n",
" running_loss += loss\n",
" if i % PRINT_EVERY == PRINT_EVERY - 1: # print every PRINT_EVERY mini-batches\n",
" #show_image(torchvision.utils.make_grid(inputs.data))\n",
" print(f\"[{epoch + 1}, {i+1:5d}] loss: {running_loss/100:.3f}\", end=\"\\r\", flush=True)\n",
" running_loss = 0.0\n",
" \n",
" print('Finished Training')\n",
" return net"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "W0SLImjx1QE1"
},
"source": [
"## Testing\n",
"\n",
"The function below evaluates a trained model on the test set. If we were doing this for real, we should only run a model on the test set once, before publishing your results. In this assignment, we're re-using the test set, treating it more like a validation set."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
},
"colab_type": "code",
"id": "jo1Zkn5EyNG7"
},
"outputs": [],
"source": [
"def test_model(net):\n",
" correct = 0\n",
" total = 0\n",
" \n",
" \n",
" with torch.no_grad():\n",
" net.eval()\n",
" for data in testloader:\n",
" images, labels = data\n",
" \n",
" # if linear_model:\n",
" # images = images.reshape((-1, 28*28))\n",
" \n",
" images = images.to(device)\n",
" labels = labels.to(device)\n",
" \n",
" outputs = net(images)\n",
" _, predicted = torch.max(outputs.data, 1)\n",
" total += labels.size(0)\n",
" correct += (predicted == labels).sum()\n",
" \n",
" acc = 100 * correct / total\n",
" \n",
" print(f\"# Parameters: {get_n_params(net)}\")\n",
" print(f'Accuracy of the network on the 10000 test images: {acc}%')\n",
" print(f'Correct: {correct}/{total}\\n')\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# TODO\n",
"\n",
"1. Train the MLP, then test to see its performance.\n",
"2. Train the example CNN, then test to see its performance.\n",
"3. Modify the example model to either improve its performance or decrease its parameter count without hurting performance. Can you get over 99%? 99.5%?\n",
"\n",
"A few of ideas you can try:\n",
"* Different model structure (e.g. more layers, smaller/bigger kernels)\n",
"* Residual connections [0]\n",
"* Batch [2] / Layer Normalization [3]\n",
"* Densely connected architectures [1]\n",
"\n",
"[0] https://paperswithcode.com/method/residual-connection\n",
" \n",
"[1] Huang, G., Liu, Z., Weinberger, K. Q., & van der Maaten, L. (2017, July). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. 1, No. 2, p. 3).\n",
"\n",
"[2] Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167.\n",
"\n",
"[3] Ba, J. L., Kiros, J. R., & Hinton, G. E. (2016). Layer normalization. arXiv preprint arXiv:1607.06450.\n",
""
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Finished Training 0.065\n"
]
}
],
"source": [
"# Train the MLP\n",
"mlp = ML.MLP(28*28, 10) # 28x28 pixel input, 10-class output\n",
"mlp = train_model(mlp)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"# Parameters: 118282\n",
"Accuracy of the network on the 10000 test images: 97.12000274658203%\n",
"Correct: 9712/10000\n",
"\n"
]
}
],
"source": [
"test_model(mlp)"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Finished Training 0.024\n"
]
}
],
"source": [
"cnn = train_model(my_cnn)"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"# Parameters: 44590\n",
"Accuracy of the network on the 10000 test images: 98.61000061035156%\n",
"Correct: 9861/10000\n",
"\n"
]
}
],
"source": [
"test_model(cnn)"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
},
"colab_type": "code",
"id": "k2CMdk8LKGdW"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Model number of parameters: 44590\n"
]
}
],
"source": [
"class YourModel(nn.Module):\n",
" def __init__(self):\n",
" super(YourModel, self).__init__()\n",
" # Convolution. Input channels: 1, output channels: 6, kernel size: 5\n",
" self.conv1 = nn.Conv2d(1, 16, 3, padding='same')\n",
" self.conv2 = nn.Conv2d(16, 16, 3, stride=2)\n",
" \n",
" self.conv3 = nn.Conv2d(16, 16, 3, padding='same')\n",
" self.conv4 = nn.Conv2d(16, 16, 3, stride=2)\n",
" \n",
" # Three fully connected layers\n",
" self.fc1 = nn.Linear(16 * 6 * 6, 60)\n",
" self.fc2 = nn.Linear(60, 40)\n",
" self.fc3 = nn.Linear(40, 10)\n",
"\n",
" def forward(self, x):\n",
" # Apply convolution, activation and pooling\n",
" # Output width after convolution = (input_width - (kernel_size - 1) / 2)\n",
" # Output width after pooling = input_width / 2\n",
" \n",
" # x.size() = Bx1x28x28\n",
" x = x + F.relu(self.conv1(x)) # residual connection!\n",
" # x.size() = Bx6x28x28\n",
" x = F.relu(self.conv2(x))\n",
" # x.size() = Bx12x13x13\n",
" x = x + F.relu(self.conv3(x)) # residual connection!\n",
" # x.size() = Bx16x13x13\n",
" x = F.relu(self.conv4(x))\n",
" # x.size() = Bx16x6x6\n",
" \n",
" # Flatten the output\n",
" x = x.view(-1, 16 * 6 * 6)\n",
" x = F.relu(self.fc1(x))\n",
" x = F.relu(self.fc2(x))\n",
" x = self.fc3(x)\n",
" return x\n",
"\n",
"my_cnn = YourModel()\n",
"\n",
"print(f\"Model number of parameters: {get_n_params(my_cnn)}\")"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"==========================================================================================\n",
"Layer (type:depth-idx) Output Shape Param #\n",
"==========================================================================================\n",
"YourModel [8, 10] --\n",
"├─Conv2d: 1-1 [8, 16, 28, 28] 160\n",
"├─Conv2d: 1-2 [8, 16, 13, 13] 2,320\n",
"├─Conv2d: 1-3 [8, 16, 13, 13] 2,320\n",
"├─Conv2d: 1-4 [8, 16, 6, 6] 2,320\n",
"├─Linear: 1-5 [8, 60] 34,620\n",
"├─Linear: 1-6 [8, 40] 2,440\n",
"├─Linear: 1-7 [8, 10] 410\n",
"==========================================================================================\n",
"Total params: 44,590\n",
"Trainable params: 44,590\n",
"Non-trainable params: 0\n",
"Total mult-adds (Units.MEGABYTES): 8.24\n",
"==========================================================================================\n",
"Input size (MB): 0.03\n",
"Forward/backward pass size (MB): 1.19\n",
"Params size (MB): 0.18\n",
"Estimated Total Size (MB): 1.40\n",
"=========================================================================================="
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import torchinfo\n",
"torchinfo.summary(my_cnn, input_size=(8, 1, 28, 28))"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "KmLl5haX4Om_"
},
"source": [
"### Per-class accuracy\n",
"\n",
"Run the below cell to see which digits your model is better at recognizing and which digits it gets confused by."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
},
"colab_type": "code",
"id": "qp8POK0dyOKn"
},
"outputs": [
{
"ename": "RuntimeError",
"evalue": "shape '[-1, 784]' is invalid for input of size 18432",
"output_type": "error",
"traceback": [
"\u001b[31m---------------------------------------------------------------------------\u001b[39m",
"\u001b[31mRuntimeError\u001b[39m Traceback (most recent call last)",
"\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[19]\u001b[39m\u001b[32m, line 11\u001b[39m\n\u001b[32m 8\u001b[39m images = images.to(device)\n\u001b[32m 9\u001b[39m labels = labels.to(device)\n\u001b[32m---> \u001b[39m\u001b[32m11\u001b[39m outputs = \u001b[43mnet\u001b[49m\u001b[43m(\u001b[49m\u001b[43mimages\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 13\u001b[39m _, predicted = torch.max(outputs.data, \u001b[32m1\u001b[39m)\n\u001b[32m 14\u001b[39m c = (predicted == labels).squeeze()\n",
"\u001b[36mFile \u001b[39m\u001b[32m~/Documents/2610/1053/Lectures/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1775\u001b[39m, in \u001b[36mModule._wrapped_call_impl\u001b[39m\u001b[34m(self, *args, **kwargs)\u001b[39m\n\u001b[32m 1773\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m._compiled_call_impl(*args, **kwargs) \u001b[38;5;66;03m# type: ignore[misc]\u001b[39;00m\n\u001b[32m 1774\u001b[39m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[32m-> \u001b[39m\u001b[32m1775\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43m_call_impl\u001b[49m\u001b[43m(\u001b[49m\u001b[43m*\u001b[49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43m*\u001b[49m\u001b[43m*\u001b[49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n",
"\u001b[36mFile \u001b[39m\u001b[32m~/Documents/2610/1053/Lectures/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1786\u001b[39m, in \u001b[36mModule._call_impl\u001b[39m\u001b[34m(self, *args, **kwargs)\u001b[39m\n\u001b[32m 1781\u001b[39m \u001b[38;5;66;03m# If we don't have any hooks, we want to skip the rest of the logic in\u001b[39;00m\n\u001b[32m 1782\u001b[39m \u001b[38;5;66;03m# this function, and just call forward.\u001b[39;00m\n\u001b[32m 1783\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m (\u001b[38;5;28mself\u001b[39m._backward_hooks \u001b[38;5;129;01mor\u001b[39;00m \u001b[38;5;28mself\u001b[39m._backward_pre_hooks \u001b[38;5;129;01mor\u001b[39;00m \u001b[38;5;28mself\u001b[39m._forward_hooks \u001b[38;5;129;01mor\u001b[39;00m \u001b[38;5;28mself\u001b[39m._forward_pre_hooks\n\u001b[32m 1784\u001b[39m \u001b[38;5;129;01mor\u001b[39;00m _global_backward_pre_hooks \u001b[38;5;129;01mor\u001b[39;00m _global_backward_hooks\n\u001b[32m 1785\u001b[39m \u001b[38;5;129;01mor\u001b[39;00m _global_forward_hooks \u001b[38;5;129;01mor\u001b[39;00m _global_forward_pre_hooks):\n\u001b[32m-> \u001b[39m\u001b[32m1786\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mforward_call\u001b[49m\u001b[43m(\u001b[49m\u001b[43m*\u001b[49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43m*\u001b[49m\u001b[43m*\u001b[49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 1788\u001b[39m result = \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[32m 1789\u001b[39m called_always_called_hooks = \u001b[38;5;28mset\u001b[39m()\n",
"\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[18]\u001b[39m\u001b[32m, line 32\u001b[39m, in \u001b[36mYourModel.forward\u001b[39m\u001b[34m(self, x)\u001b[39m\n\u001b[32m 28\u001b[39m x = F.relu(\u001b[38;5;28mself\u001b[39m.conv4(x))\n\u001b[32m 29\u001b[39m \u001b[38;5;66;03m# x.size() = Bx16x7x7\u001b[39;00m\n\u001b[32m 30\u001b[39m \n\u001b[32m 31\u001b[39m \u001b[38;5;66;03m# Flatten the output\u001b[39;00m\n\u001b[32m---> \u001b[39m\u001b[32m32\u001b[39m x = \u001b[43mx\u001b[49m\u001b[43m.\u001b[49m\u001b[43mview\u001b[49m\u001b[43m(\u001b[49m\u001b[43m-\u001b[49m\u001b[32;43m1\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[32;43m16\u001b[39;49m\u001b[43m \u001b[49m\u001b[43m*\u001b[49m\u001b[43m \u001b[49m\u001b[32;43m7\u001b[39;49m\u001b[43m \u001b[49m\u001b[43m*\u001b[49m\u001b[43m \u001b[49m\u001b[32;43m7\u001b[39;49m\u001b[43m)\u001b[49m\n\u001b[32m 33\u001b[39m x = F.relu(\u001b[38;5;28mself\u001b[39m.fc1(x))\n\u001b[32m 34\u001b[39m x = F.relu(\u001b[38;5;28mself\u001b[39m.fc2(x))\n",
"\u001b[31mRuntimeError\u001b[39m: shape '[-1, 784]' is invalid for input of size 18432"
]
}
],
"source": [
"net = my_cnn\n",
"\n",
"class_correct = list(0. for i in range(10))\n",
"class_total = list(0. for i in range(10))\n",
"\n",
"for data in testloader:\n",
" images, labels = data\n",
" images = images.to(device)\n",
" labels = labels.to(device)\n",
" \n",
" outputs = net(images)\n",
" \n",
" _, predicted = torch.max(outputs.data, 1)\n",
" c = (predicted == labels).squeeze()\n",
" for i in range(4):\n",
" label = labels[i]\n",
" class_correct[label] += c[i]\n",
" class_total[label] += 1\n",
"\n",
"\n",
"for i in range(10):\n",
" print('Accuracy of %5s : %2d %%' % (i, 100 * class_correct[i] / class_total[i]))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"collapsed_sections": [],
"default_view": {},
"name": "CS5670_Project5_MNISTChallenge.ipynb",
"provenance": [],
"version": "0.3.2",
"views": {}
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.12"
}
},
"nbformat": 4,
"nbformat_minor": 4
}