Unit 2: LLM architectures

Published

January 26, 2026

In this unit, you will explore the Transformer architecture, which forms the foundation of today’s large language models. You will also learn about the two main types of language models built on this architecture: decoder-based models (such as GPT) and encoder-based models (such as BERT).

Lectures

The lectures begin by discussing the limitations of the architecture that came before Transformers: recurrent neural networks. Next, you will learn about the key technical idea behind Transformers, followed by an overview of the Transformer architecture itself. Finally, the lectures explain how this architecture is used in GPT and BERT.

Section Title Video Slides Quiz
2.1 Introduction to machine translation video slides quiz
2.2 Neural machine translation video slides quiz
2.3 Attention video slides quiz
2.4 The Transformer architecture video slides quiz
2.5 Decoder-based language models (GPT) video slides quiz
2.6 Encoder-based language models (BERT) video slides quiz
ImportantQuiz deadline

To earn a wildcard for this unit, you must complete the quizzes before the teaching session on Unit 2.

Additional materials

Lab

In lab 2, you will do a deep dive into the inner workings of the GPT architecture. You will walk through a complete implementation of the architecture in PyTorch, instantiate this implementation with pre-trained weights, and put the resulting model to the test by generating text.

View the lab on GitLab

This unit also features an advanced lab. In this lab, you will take the existing from-scratch implementation of the GPT architecture from the basic lab and modify it to implement the BERT architecture.

View the lab on GitLab