Module 2
This module will dive deeper in the techniques used to train modern language models. In addition, we will discuss applications and methodologies in text generation.
When comparing the material for this module with that for Module 1, you will notice an increased share of material from external sources, such as video recordings and research articles. We have selected these to provide you with an up-to-date overview of this fast-developing field. If you want to know more, feel free to search for more detailed video lectures and articles from other sources.
We will discuss this module during the second course meeting in Gothenburg. Please see the meeting page for details.
Unit 2-1: Modern large language models
This unit reviews some of the central topics related to modern language models (LLMs), notably from the GPT family. We examine their emergent capabilities like zero-shot learning and in-context learning and explore methods for aligning LLMs with human instructions and preferences. Finally, the lectures address the crucial aspect of evaluating general-purpose language models, offering insights into their effectiveness and applicability across various tasks and domains.
Title | Slides | Video |
---|---|---|
Introduction to modern language models | [slides] | [video] |
Emergent abilities of LLMs | [slides] | [video] |
LLM alignment | [slides] | [video] |
Evaluating LLMs | [slides] | [video] |
Reading
- Brown et al. (2020): Language Models are Few-Shot Learners
- Ouyang et al. (2022): Aligning language models to follow instructions
Surveys and other optional material
- Kaddour et al. (2023): Challenges and Applications of Large Language Models
- Liu et al. (2023): Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing
- Minaee et al. (2024): Large Language Models: A Survey
- Zheng et al. (2023): Secrets of RLHF in Large Language Models Part I: PPO
Software resources
Unit 2-2: Working with open large language models
The lectures in this unit present various techniques that collectively empower users to maximize the utility and efficiency of open large language models in various applications and scenarios. They explore efficient fine-tuning methods and quantization techniques to optimize model performance during training and deployment. The final lecture discusses retrieval augmentation, a strategy to enrich LLMs’ responses by incorporating additional information from retrieval systems.
Title | Slides | Video |
---|---|---|
Open LLMs | [video] | |
Efficient fine-tuning techniques | [slides] | [video] |
Quantization | [video] | |
Quantized fine-tuning | [video] | |
Retrieval augmentation | [video] |
Reading
- Dettmers et al. (2023): QLORA: Efficient Finetuning of Quantized LLMs
- Ram et al. (2023): In-Context Retrieval-Augmented Language Models
Surveys and other optional material
- Chen et al. (2023): ChatGPT’s One-year Anniversary: Are Open-Source Large Language Models Catching up?
- Lialin et al. (2023): Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning
- Wan et al. (2023): Efficient Large Language Models: A Survey
- Gao et al. (2023): Retrieval-Augmented Generation for Large Language Models: A Survey
- Retrieval augmentation video by Douwe Kiela, part of a Stanford course
Software resources
- bitsandbytes
- peft
- accelerate
- Ollama
- Llamaindex
- Some accessible models: Llama 3, Mistral, Falcon
Unit 2-3: Generating text: Applications and methodology
The third unit explores applications of large language models (LLMs) in various generation tasks. Specific tasks covered include summarization, condensing information effectively, and dialogue generation, facilitating natural and engaging conversations. The unit also introduces evaluation methods for assessing the efficacy of generation systems.
Title | Slides | Video |
---|---|---|
Introduction to generation tasks | [slides] | [video] |
Evaluation of generation systems | [slides] | [video] |
Summarization | [slides] | [video] |
Dialogue | [slides] | [video] |
Reading
- Eisenstein, chapter 19
- Goyal et al. (2022) News Summarization and Evaluation in the Era of GPT-3