Lecture 17 - Network Architectures; Other Vision Tasks¶

Announcements¶

  • P16 and P17 solutions posted; please complete a Week 4 reflection by tomorrow morning
  • Course response forms: please do this either today or tomorrow.
  • Tomorrow:
    • 9:30-10:40 Week 4 check-ins; as usual, but I may have a few more questions reflecting back on the course as a whole
      • While you wait: complete course response form, work on Project 4, eat cookies
    • 10:40-11:00: walk down to Stone Leaf Teahouse in the marble works
    • 11:00-12:10: drink good tea, ask me anything, and celebrate a successful winter term!

Goals¶

  • Have fun and learn some things.
  • Ask at least one question.

L17A - Convnet History / Architectures for Recognition (and Other Tasks)¶

Modern convnets: Alexnet -> VGG -> Inception -> ResNet -> ...

Okay but the data cost

  • Transfer learning - finetuning

Other applications:

  • object detection (get boxy)
  • {semantic,instance,panoptic} segmentation
  • Literally every other corner of computer vision

L17B - Transformer Architecture; Generative Models; Language/Image¶

The new architecture on the block: transformers (slides)

Representation learning

  • notion of latent space, data manifolds, "manifold learning"
  • last layer of a pretrained network is a good latent space

Representation learning without labels:

  • un/self-supervised learning
  • DINO/momentum contrast
  • autoencoders
  • masked modeling

Generative modeling:

  • Generative Adversarial Networks A diagram of the GAN architecture (image source: https://developers.google.com/machine-learning/gan/gan_structure)
  • Diffusion models https://arxiv.org/pdf/2011.13456
  • Flow models https://mlg.eng.cam.ac.uk/blog/2024/01/20/flow-matching.html
  • Latent diffusion/flow models (e.g. Stable Diffusion) https://arxiv.org/pdf/2112.10752 (Fig. 3)

Language and vision:

  • CLIP, DALL-E*
In [ ]: