Unit 2: Transformer-based models

Published

September 18, 2025

In this unit, you will explore the Transformer architecture, which forms the foundation of today’s large language models. You will also learn about the two main types of language models built on this architecture: decoder-based models (such as GPT) and encoder-based models (such as BERT).

Lectures

The lectures begin by discussing the limitations of the architecture that came before Transformers: recurrent neural networks. Next, you will learn about the key technical idea behind Transformers, followed by an overview of the Transformer architecture itself. Finally, the lectures explain how this architecture is used in GPT and BERT.

Section	Title	Video	Slides	Quiz
2.1	Introduction to machine translation	video	slides	quiz
2.2	Neural machine translation	video	slides	quiz
2.3	Attention	video	slides	quiz
2.4	The Transformer architecture	video	slides	quiz
2.5	Decoder-based language models (GPT)	video	slides	quiz
2.6	Encoder-based language models (BERT)	video	slides	quiz

Quiz deadline

To earn a wildcard for this unit, you must complete the quizzes no later than 2025-10-07.

Online meeting

During the online meeting, we will explore how researchers try to understand the inner workings of large language models. We will focus on how attention mechanisms have been used to analyse what information these models capture, as well as the criticisms and limitations of this approach.

Meeting details

The meeting will take place on 2025-10-08 between 18:00–20:00. A Zoom link will be sent out via the course mailing list.

Additional materials

Lab

In lab 2, you will do a deep dive into the inner workings of the GPT architecture. You will walk through a complete implementation of the architecture in PyTorch, instantiate this implementation with pre-trained weights, and put the resulting model to the test by generating text.

View the lab on GitLab

Review deadline

If you want a written review of this lab, you must submit it (via Lisam) no later than 2025-10-31.