{ "cells": [ { "cell_type": "markdown", "id": "c3109b7e-0092-47a3-aca2-288e745731ba", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "## Lecture 17 - Network Architectures; Other Vision Tasks" ] }, { "cell_type": "markdown", "id": "c0357b50-2962-423e-85f1-80217ec3d134", "metadata": {}, "source": [ "#### Announcements\n", "\n", "* P16 and P17 solutions posted; please complete a Week 4 reflection by tomorrow morning\n", "* Course response forms: please do this either today or tomorrow.\n", "* Tomorrow:\n", " * 9:30-10:40 Week 4 check-ins; as usual, but I may have a few more questions reflecting back on the course as a whole\n", " * While you wait: complete course response form, work on Project 4, eat cookies\n", " * 10:40-11:00: walk down to Stone Leaf Teahouse in the marble works\n", " * 11:00-12:10: drink good tea, ask me anything, and celebrate a successful winter term!" ] }, { "cell_type": "markdown", "id": "38efd636-c09e-4d1f-a828-33e0fcc75686", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "#### Goals\n", "* Have fun and learn some things.\n", "* Ask at least one question." ] }, { "cell_type": "markdown", "id": "7d50fe24-44a6-49b3-af23-8b98ab99a701", "metadata": {}, "source": [ "#### L17A - Convnet History / Architectures for Recognition (and Other Tasks)\n", "\n", "Modern convnets: Alexnet -> VGG -> Inception -> ResNet -> ...\n", "\n", "Okay but the data cost\n", "\n", "* Transfer learning - finetuning\n", "\n", "Other applications: \n", "\n", "* object detection (get boxy)\n", "* {semantic,instance,panoptic} segmentation\n", "* Literally every other corner of computer vision\n", "\n", "#### L17B - Transformer Architecture; Generative Models; Language/Image\n", "\n", "The new architecture on the block: **transformers** (slides)\n", "\n", "Representation learning\n", "\n", "* notion of latent space, data manifolds, \"manifold learning\"\n", "* last layer of a pretrained network is a good latent space\n", " \n", "Representation learning without labels:\n", "* un/self-supervised learning\n", "* DINO/momentum contrast\n", "* autoencoders\n", "* masked modeling\n", "\n", "Generative modeling:\n", "\n", "* Generative Adversarial Networks\n", " ![A diagram of the GAN architecture](https://developers.google.com/static/machine-learning/gan/images/gan_diagram.svg?dcb_=0.9763640332529248)\n", " (image source: )\n", "* Diffusion models \n", "* Flow models \n", "* Latent diffusion/flow models (e.g. Stable Diffusion) (Fig. 3)\n", "\n", "Language and vision:\n", "\n", "* CLIP, DALL-E*" ] }, { "cell_type": "code", "execution_count": null, "id": "f3e752cf-2552-4b26-90ee-4eaaa6265981", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.12" } }, "nbformat": 4, "nbformat_minor": 5 }