Unit 2: LLM architectures

Published

January 26, 2026

In this unit, you will explore the Transformer architecture, which forms the foundation of today’s large language models. You will also learn about the two main types of language models built on this architecture: decoder-based models (such as GPT) and encoder-based models (such as BERT).

Lectures

The lectures begin by discussing the limitations of the architecture that came before Transformers: recurrent neural networks. Next, you will learn about the key technical idea behind Transformers, followed by an overview of the Transformer architecture itself. Finally, the lectures explain how this architecture is used in GPT and BERT.

Section	Title	Video	Slides
2.1	Attention	video	slides
2.2	Introduction to Transformers	video	slides
2.3	Transformers in more detail	video	slides
2.4	Representing positions in Transformers	video	slides
2.5	Generating text from a language model	video	slides
2.6	Transformer representation models	video	slides

Assignment

Link to the assignment