Unit 2: LLM architectures

Published

January 26, 2026

In this unit, you will explore the Transformer architecture, which forms the foundation of today’s large language models. You will also learn about the two main types of language models built on this architecture: decoder-based models (such as GPT) and encoder-based models (such as BERT).

Lectures

The lectures begin by discussing the limitations of the architecture that came before Transformers: recurrent neural networks. Next, you will learn about the key technical idea behind Transformers, followed by an overview of the Transformer architecture itself. Finally, the lectures explain how this architecture is used in GPT and BERT.

Section Title Video Slides
2.1 Attention video slides
2.2 Introduction to Transformers video slides
2.3 Transformers in more detail video slides
2.4 Representing positions in Transformers video slides
2.5 Generating text from a language model video slides
2.6 Transformer representation models video slides

Assignment

Link to the assignmnent