Lecture 17 - Network Architectures; Other Vision Tasks¶
Announcements¶
- P16 and P17 solutions posted; please complete a Week 4 reflection by tomorrow morning
- Course response forms: please do this either today or tomorrow.
- Tomorrow:
- 9:30-10:40 Week 4 check-ins; as usual, but I may have a few more questions reflecting back on the course as a whole
- While you wait: complete course response form, work on Project 4, eat cookies
- 10:40-11:00: walk down to Stone Leaf Teahouse in the marble works
- 11:00-12:10: drink good tea, ask me anything, and celebrate a successful winter term!
- 9:30-10:40 Week 4 check-ins; as usual, but I may have a few more questions reflecting back on the course as a whole
Goals¶
- Have fun and learn some things.
- Ask at least one question.
L17A - Convnet History / Architectures for Recognition (and Other Tasks)¶
Modern convnets: Alexnet -> VGG -> Inception -> ResNet -> ...
Okay but the data cost
- Transfer learning - finetuning
Other applications:
- object detection (get boxy)
- {semantic,instance,panoptic} segmentation
- Literally every other corner of computer vision
L17B - Transformer Architecture; Generative Models; Language/Image¶
The new architecture on the block: transformers (slides)
Representation learning
- notion of latent space, data manifolds, "manifold learning"
- last layer of a pretrained network is a good latent space
Representation learning without labels:
- un/self-supervised learning
- DINO/momentum contrast
- autoencoders
- masked modeling
Generative modeling:
- Generative Adversarial Networks
(image source: https://developers.google.com/machine-learning/gan/gan_structure)
- Diffusion models https://arxiv.org/pdf/2011.13456
- Flow models https://mlg.eng.cam.ac.uk/blog/2024/01/20/flow-matching.html
- Latent diffusion/flow models (e.g. Stable Diffusion) https://arxiv.org/pdf/2112.10752 (Fig. 3)
Language and vision:
- CLIP, DALL-E*
In [ ]: