Transformer-based models
In this unit, you will learn about the Transformer, the neural architecture at the core of modern large language models. This architecture evolved in response to shortcomings of recurrent neural networks. To explain these shortcomings and how the Transformer fixes them, we start the unit by looking into machine translation.
Lectures
We begin this unit with the sequence-to-sequence or encoder–decoder architecture for neural machine translation. We then delve into the concept of attention, followed by the Transformer architecture as such. At the end of the unit we discuss two important families of Transformer-based models: the GPT family, which derives from the Transformer’s decoder side, and BERT, which utilizes the encoder side.
Section | Title | Video | Slides | Quiz |
---|---|---|---|---|
3.1 | Introduction to machine translation | video | slides | quiz |
3.2 | Neural machine translation | video | slides | quiz |
3.3 | Attention | video | slides | quiz |
3.4 | The Transformer architecture | video | slides | quiz |
3.5 | Decoder-based language models (GPT) | video | slides | quiz |
3.6 | Encoder-based language models (BERT) | video | slides | quiz |
Lab
In this lab, you will implement the encoder–decoder architecture of Sutskever et al., 2014, including the attention-based extension presented of Bahdanau et al., 2015, and evaluate this architecture on a machine translation task.