Unit 2: Transformer-based models

Published

January 27, 2025

In this unit, you will learn about the neural architecture at the core of modern large language models, the Transformer. This architecture evolved in response to shortcomings of recurrent neural networks. To explain these shortcomings and how the Transformer fixes them, we start the unit by looking into the classical task of machine translation.

Lectures

Deadline for the quizzes: 2025-03-05

Section	Title	Video	Slides	Quiz
2.1	Introduction to machine translation	video	slides	quiz
2.2	Neural machine translation	video	slides	quiz
2.3	Attention	video	slides	quiz
2.4	The Transformer architecture	video	slides	quiz
2.5	Decoder-based language models (GPT)	video	slides	quiz
2.6	Encoder-based language models (BERT)	video	slides	quiz

Lab

Deadline for the lab: 2025-03-26

In lab 2, you will dive into the inner workings of the GPT architecture. You will walk through a complete implementation of the architecture in PyTorch, instantiate this implementation with pre-trained weights, and put the resulting model to the test by generating text. At the end of this lab, you will understand the building blocks of the GPT architecture and how they are connected.

Link to the lab