Transformer-based models

Published

October 17, 2024

In this unit, you will learn about the Transformer, the neural architecture at the core of modern large language models. This architecture evolved in response to shortcomings of recurrent neural networks. To explain these shortcomings and how the Transformer fixes them, we start the unit by looking into machine translation.

Lectures

We begin this unit with the sequence-to-sequence or encoder–decoder architecture for neural machine translation. We then delve into the concept of attention, followed by the Transformer architecture as such. At the end of the unit we discuss two important families of Transformer-based models: the GPT family, which derives from the Transformer’s decoder side, and BERT, which utilizes the encoder side.

Section Title Video Slides Quiz
3.1 Introduction to machine translation video slides quiz
3.2 Neural machine translation video slides quiz
3.3 Attention video slides quiz
3.4 The Transformer architecture video slides quiz
3.5 Decoder-based language models (GPT) video slides quiz
3.6 Encoder-based language models (BERT) video slides quiz

Lab

In this lab, you will implement the encoder–decoder architecture of Sutskever et al., 2014, including the attention-based extension presented of Bahdanau et al., 2015, and evaluate this architecture on a machine translation task.

Link to the lab (course repo)