Unit 4: Alignment and current research

Published

April 27, 2026

In this unit, you will learn more about the alignment stage of LLM training. You will also see some examples of current research in this and related areas. The unit features both lecture-style reviews of recent developments and videos from research presentations.

Lectures

The lectures start by exploring LLMs alignment. You will then learn about retrieval-augmented generation and current research on how LLMs store facts, and how tokenisation relates to LLM privacy and security. The series concludes by looking at the environmental cost of chatbot technology.

Section	Title	Video	Slides
4.1	Reinforcement learning with human feedback	video	slides
4.2	Direct preference optimisation	video	slides
4.3	Retrieval-augmented generation	video	none
4.4	LLMs for fact completion	video	slides
4.5	Adversarial tokenization	video	slides
4.6	Environmental cost of chatbot technology	video	slides

Additional materials

PPO for LLMs: A Guide for Normal People
Demystifying Reasoning Models
Group Relative Policy Optimization

Assignment

Link to the assignmnent