Unit 2: Transformer-based models
Lectures
In this unit, you will learn about the neural architecture at the core of modern large language models, the Transformer. This architecture evolved in response to shortcomings of recurrent neural networks. To explain these shortcomings and how the Transformer fixes them, we start the unit by looking into the classical task of machine translation.
Section | Title | Video | Slides | Quiz |
---|---|---|---|---|
2.1 | Introduction to machine translation | video | slides | quiz |
2.2 | Neural machine translation | video | slides | quiz |
2.3 | Attention | video | slides | quiz |
2.4 | The Transformer architecture | video | slides | quiz |
2.5 | Decoder-based language models (GPT) | video | slides | quiz |
2.6 | Encoder-based language models (BERT) | video | slides | quiz |
Lab
In lab 2, you will dive into the inner workings of the GPT architecture. You will walk through a complete implementation of the architecture in PyTorch, instantiate this implementation with pre-trained weights, and put the resulting model to the test by generating text. At the end of this lab, you will understand the building blocks of the GPT architecture and how they are connected.
Advanced lab
In the advanced lab for this unit, you will take the existing from-scratch implementation of the GPT architecture from lab 2 and modify it to implement the BERT architecture with minimal necessary changes. You will validate your implementation by loading pre-trained BERT weights from Hugging Face and verifying that it produces the same input-output behaviour as the official BERT model.