Week 1
I hope that you had a good start to the course!
We have reached the end of “intro week,” and I hope you all feel ready to dive into the core content. Here is a quick summary of what we have covered so far, what is coming up, and what you need to do to stay on track.
Have a good weekend, and I will see you on Monday!
This week: Introduction and review
The session on Monday introduced you to natural language processing and walked you through the course logistics. You also learned how to build a simple character-based language model based on probabilities.
Tuesday’s lab session focused on setting up the lab environment and reviewing some text processing basics.
Review continued on Wednesday with mini-lectures about \(n\)-gram language models and linear neural networks. At the end of the session, we went through the code for the neural version of the character-based language models. In the lab, you had a chance to get familiar with PyTorch.
Finally, today’s session introduced the course project.
How do these wildcards work?
I have received a couple of questions about the quizzes and in particular about how “wildcards” work. It’s actually very simple. 🙂
The purpose of the quizzes is to help you check your understanding of the video lectures. You should do the quizzes for each unit before the next teaching session. This will prepare you for the in-class assignment, and help me identify concepts that are difficult, so that I can discuss them in more detail in class.
By completing all quizzes from a given unit on time, you also earn a wildcard. At the end of the first half of the course, you will take a written test with questions sampled from the quizzes. The test will have 40 questions in total, 10 from each unit. A wildcard will turn one wrong answer from that unit into a correct one on the test.
As an example, assume that you submitted the quizzes from Units 1–3 in time, but were sick during Unit 4. You thus earned wildcards for the first three units. Assume further that in the test you score 10 points on the questions on Unit 1, 8 points in Unit 2, and 9 points each in Units 3–4. Then your final score on the test will be
\[ \underbrace{(10 + 8 + 9 + 9)}_{\text{raw scores}} + \underbrace{(0 + 1 + 1 + 0)}_{\text{wildcards}} = 38 \]
Your wildcard for Unit 1 is not relevant because you correctly answered all questions from that unit. Your wildcard for Unit 2 turns one of your incorrect answers into a correct one, and the same for Unit 3.
To-do this week
Here is a list of to-do items from this week:
Todo 1: Register for the course
LiU requires you to register for your courses in Ladok no later than one week after they begin. Please do this as soon as possible to keep access to the mailing list.
Todo 2: Find a lab partner
In addition to the Ladok registration, you also need to register your lab groups in Webreg. Please so no later than 30 January. I hope that most of you have found a lab partner by now; however, if not, do not worry: Register as a one-person group, and we will pair you up with someone else in the same situation.
Todo 3: Prepare for next week
Before Monday’s class, you should watch the video lectures for Unit 1 on tokenisation and embeddings and complete the quizzes. On Monday, we will review the quiz answers and work on the first in-class assignment.
Next week: Tokenisation and embeddings
The video lectures cover two key concepts of NLP: tokenisation and embeddings. They start with traditional word-based tokenisation and then present the Byte Pair Encoding (BPE) algorithm, which is widely used in modern language models. In the second half of the unit, you learn about embeddings, focusing on word embeddings, which capture the meaning of words in numerical form. In the lab, you will implement the BPE algorithm and explore how embeddings work.