Transformer-based models

Published

October 17, 2024

In this unit, you will learn about the Transformer, the neural architecture at the core of modern large language models. This architecture evolved in response to shortcomings of recurrent neural networks. To explain these shortcomings and how the Transformer fixes them, we start the unit by looking into machine translation.

Lectures

We begin this unit with the sequence-to-sequence or encoder–decoder architecture for neural machine translation. We then delve into the concept of attention, followed by the Transformer architecture as such. At the end of the unit we discuss two important families of Transformer-based models: the GPT family, which derives from the Transformer’s decoder side, and BERT, which utilizes the encoder side.

Section	Title	Video	Slides	Quiz
3.1	Introduction to machine translation	video	slides	quiz
3.2	Neural machine translation	video	slides	quiz
3.3	Attention	video	slides	quiz
3.4	The Transformer architecture	video	slides	quiz
3.5	Decoder-based language models (GPT)	video	slides	quiz
3.6	Encoder-based language models (BERT)	video	slides	quiz

Lab

In this lab, you will implement the encoder–decoder architecture of Sutskever et al., 2014, including the attention-based extension presented of Bahdanau et al., 2015, and evaluate this architecture on a machine translation task.

Link to the lab (course repo)