{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "c3109b7e-0092-47a3-aca2-288e745731ba",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "## Lecture 17 - Network Architectures; Other Vision Tasks"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c0357b50-2962-423e-85f1-80217ec3d134",
   "metadata": {},
   "source": [
    "#### Announcements\n",
    "\n",
    "* P16 and P17 solutions posted; please complete a Week 4 reflection by tomorrow morning\n",
    "* Course response forms: please do this either today or tomorrow.\n",
    "* Tomorrow:\n",
    "  * 9:30-10:40 Week 4 check-ins; as usual, but I may have a few more questions reflecting back on the course as a whole\n",
    "    * While you wait: complete course response form, work on Project 4, eat cookies\n",
    "  * 10:40-11:00: walk down to Stone Leaf Teahouse in the marble works\n",
    "  * 11:00-12:10: drink good tea, ask me anything, and celebrate a successful winter term!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "38efd636-c09e-4d1f-a828-33e0fcc75686",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "#### Goals\n",
    "* Have fun and learn some things.\n",
    "* Ask at least one question."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7d50fe24-44a6-49b3-af23-8b98ab99a701",
   "metadata": {},
   "source": [
    "#### L17A - Convnet History / Architectures for Recognition (and Other Tasks)\n",
    "\n",
    "Modern convnets: Alexnet -> VGG -> Inception -> ResNet -> ...\n",
    "\n",
    "Okay but the data cost\n",
    "\n",
    "* Transfer learning - finetuning\n",
    "\n",
    "Other applications: \n",
    "\n",
    "* object detection (get boxy)\n",
    "* {semantic,instance,panoptic} segmentation\n",
    "* Literally every other corner of computer vision\n",
    "\n",
    "#### L17B - Transformer Architecture; Generative Models; Language/Image\n",
    "\n",
    "The new architecture on the block: **transformers** (slides)\n",
    "\n",
    "Representation learning\n",
    "\n",
    "* notion of latent space, data manifolds, \"manifold learning\"\n",
    "* last layer of a pretrained network is a good latent space\n",
    "    \n",
    "Representation learning without labels:\n",
    "* un/self-supervised learning\n",
    "* DINO/momentum contrast\n",
    "* autoencoders\n",
    "* masked modeling\n",
    "\n",
    "Generative modeling:\n",
    "\n",
    "* Generative Adversarial Networks\n",
    "  ![A diagram of the GAN architecture](https://developers.google.com/static/machine-learning/gan/images/gan_diagram.svg?dcb_=0.9763640332529248)\n",
    "  (image source: <https://developers.google.com/machine-learning/gan/gan_structure>)\n",
    "* Diffusion models <https://arxiv.org/pdf/2011.13456>\n",
    "* Flow models <https://mlg.eng.cam.ac.uk/blog/2024/01/20/flow-matching.html>\n",
    "* Latent diffusion/flow models (e.g. Stable Diffusion) <https://arxiv.org/pdf/2112.10752> (Fig. 3)\n",
    "\n",
    "Language and vision:\n",
    "\n",
    "* CLIP, DALL-E*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f3e752cf-2552-4b26-90ee-4eaaa6265981",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}