Transformer-based models

Published

January 29, 2024

In this unit, you will learn about the neural architecture at the core of modern large language models, the Transformer. This architecture evolved in response to shortcomings of recurrent neural networks. To explain these shortcomings and how the Transformer fixes them, we start the unit by looking into the classical task of machine translation.

Video lectures

Section	Title
3.01	Introduction to machine translation
3.02	Neural machine translation
3.03	Attention
3.04	The Transformer architecture
3.05	Decoder-based language models (GPT)
3.06	Encoder-based language models (BERT)

Reading

Eisenstein (2019), chapter 18