import torch
Introduction to PyTorch
The purpose of this notebook is to introduce you to the basics of PyTorch, the deep learning framework that we will be using for the labs.
Many good introductions to PyTorch are available online, including the 60 Minute Blitz on the official PyTorch website. This notebook is designed to put focus on those basics that you will encounter in the labs. Beyond the notebook, you will also need to get comfortable with the PyTorch documentation.
We start by importing the PyTorch module:
The following code prints the current version of the module:
print(torch.__version__)
The version of PyTorch at the time of writing this notebook was 1.10.1.
Tensors
The fundamental data structure in PyTorch is the tensor, a multi-dimensional matrix containing elements of a single numerical data type. Tensors are similar to arrays as you may know them from NumPy or MATLAB.
Creating tensors
One way to create a tensor is to call the function torch.tensor()
on a Python list or NumPy array.
The code in the following cell creates a 2-dimensional tensor with 4 elements.
= torch.tensor([[0, 1], [2, 3]])
x x
Each tensor has a shape, which specifies the number and sizes of its dimensions:
x.shape
Each tensor also has a data type for its elements. More information about data types
x.dtype
When creating a tensor, you can explicitly pass the intended data type as a keyword argument:
= torch.tensor([[0, 1], [2, 3]], dtype=torch.float)
y y.dtype
For many data types, there also exists a specialised constructor:
= torch.FloatTensor([[0, 1], [2, 3]])
z z.dtype
More creation operations
Create a 3D-tensor of the specified shape filled with the scalar value zero:
= torch.zeros(2, 3, 5)
x x
Create a 3D-tensor filled with random values:
= torch.rand(2, 3, 5)
x x
Create a tensor with the same shape as another one, but filled with ones:
= torch.ones_like(x)
y # shape: [2, 3, 5] y
For a complete list of tensor-creating operations, see Creation ops.
Embrace vectorisation!
Iteration is of one the most useful techniques for processing data in Python. However, you should not loop over tensors. Instead, you should be looking at vectorising any operations. This is because looping over tensors is slow, while vectorised operations on tensors are fast (and can be made even faster when the code is run on a GPU). To illustrate this point, let us create a 1D-tensor containing the first 1M integers:
= torch.arange(1000000)
x x
Summing up the elements of the tensor using a loop is relatively slow:
sum(i for i in x)
Doing the same thing using a tensor operation is much faster:
sum() x.
Indexing and slicing
To access the contents of a tensor, you can use an extended version of Python’s syntax for indexing and slicing. Essentially the same syntax is used by NumPy. For more information, see Indexing on ndarrays.
To illustrate this, we create a 3D-tensor with random numbers:
= torch.rand(2, 3, 5)
x x
Index an element by a 3D-coordinate; this gives a 0D-tensor:
0,1,2] x[
(If you want the result as a non-tensor, use the method item()
.)
Index the second element; this gives a 2D-tensor:
1] x[
Index the second-to-last element:
-2] x[
Slice out the sub-tensor with elements from index 1 onwards; this gives a 3D-tensor:
1:] x[
Here is a more complex example of slicing. As in Python, the colon :
selects all indices of a dimension.
2:4] x[:,:,
The syntax for indexing and slicing is very powerful. For example, the same effect as in the previous cell can be obtained with the following code, which uses the ellipsis (...
) to match all dimensions but the ones explicitly mentioned:
2:4] x[...,
Creating views
You will sometimes want to use a tensor with a different shape than its initial shape. In these situations, you can re-shape the tensor or create a view of the tensor. The latter is preferable because views can share the same data as their base tensors and thus do not require copying.
We create a 3D-tensor of 12 random values:
= torch.rand(2, 3, 2)
x x
Create a view of this tensor as a 2D-tensor:
3, 4) x.view(
When creating a view, the special size -1
is inferred from the other sizes:
3, -1) x.view(
Modifying a view affects the data in the base tensor:
= torch.rand(2, 3, 2)
y = y.view(3, 4)
z 2, 3] = 42
z[ y
More viewing operations
There are a few other useful methods that create views. More information about views
= torch.rand(2, 3, 5)
x x
The permute()
method returns a view of the base tensor with some of its dimensions permuted. In the example, we maintain the first dimension but swap the second and the third dimension:
= x.permute(0, 2, 1)
y print(y)
y.shape
The unsqueeze()
method returns a tensor with a dimension of size one inserted at the specified position. This is useful e.g. in the training of neural networks when you want to create a batch with just one example.
= x.unsqueeze(0)
y print(y)
y.shape
The inverse operation to unsqueeze()
is squeeze()
:
= y.squeeze(0)
y print(y)
y.shape
Re-shaping a tensor
There are some cases where you cannot create a view and need to explicitly re-shape a tensor. In particular, this happens when the data in the base tensor and the view are not in contiguous regions of memory.
= torch.rand(2, 3, 5)
x x
We permute the tensor x
to create a new tensor y
in which the data is no longer consecutive in memory:
= x.permute(0, 2, 1)
y # y = y.view(-1) # raises a runtime error
y
In such a case, you can explicitly re-shape the tensor, which will copy the data if necessary:
= x.permute(0, 2, 1)
y = y.reshape(-1)
y y
Modifying a reshaped tensor will not necessarily change the data in the base tensor. This depends on whether the reshaped tensor is able to share the data with the base tensor.
= torch.rand(2, 3, 2)
y = y.permute(0, 2, 1) # if commented out, data can be shared
y = y.reshape(-1)
z 0] = 42
z[ y
Computing with tensors
Element-wise operations
Unary mathematical operations defined on numbers can be ‘lifted’ to tensors by applying them element-wise. This includes multiplication by a constant, exponentiation (**
), taking roots (torch.sqrt()
), and the logarithm (torch.log()
).
= torch.rand(2, 3)
x print(x)
* 2 # element-wise multiplication with 2 x
Similarly, we can do binary mathematical operations on tensors with the same shape. For example, the Hadamard product of two tensors \(X\) and \(Y\) is the tensor \(X \odot Y\) obtained by the element-wise multiplication of the elements of \(X\) and \(Y\).
= torch.rand(2, 3)
x = torch.rand(2, 3)
y # shape: [2, 3] torch.mul(x, y)
The Hadamard product can be written more succinctly as follows:
* y x
Matrix product
When computing the matrix product between two tensors \(X\) and \(Y\), the sizes of the last dimension of \(X\) and the first dimension of \(Y\) must match. The shape of the resulting tensor is the concatenation of the shapes of \(X\) and \(Y\), with the last dimension of \(X\) and the first dimension of \(Y\) removed.
= torch.rand(2, 3)
x = torch.rand(3, 5)
y # shape: [2, 5] torch.matmul(x, y)
The matrix product can be written more succinctly as follows:
@ y x
Sum and argmax
Let us define a tensor of random numbers:
= torch.rand(2, 3, 5)
x x
You have already seen that we can compute the sum of a tensor:
sum(x) torch.
There is a second form of the sum operation where we can specify the dimension along which the sum should be computed. This will return a tensor with the specified dimension removed.
sum(x, dim=0) # shape: [3, 5] torch.
sum(x, dim=1) # shape: [2, 5] torch.
The same idea also applies to the operation argmax()
which returns the index of the component with the maximal value along the specified dimension.
# index of the highest component across all dimensions, numbered in consecutive order torch.argmax(x)
=0) # index of the highest component across the first dimension torch.argmax(x, dim
Concatenating tensors
A list of tensors can be combined into one long tensor by concatenation.
= torch.rand(2, 3)
x = torch.rand(3, 3)
y = torch.cat([x, y])
z print(z)
z.shape
You can also concatenate along a specific dimension:
= torch.rand(2, 2)
x = torch.rand(2, 2)
y print(x)
print(y)
print(torch.cat([x, y], dim=0))
print(torch.cat([x, y], dim=1))
Broadcasting
The term broadcasting describes how PyTorch treats tensors with different shapes. Subject to certain constraints, the ‘smaller’ tensor is ‘broadcast’ across the larger tensor so that they have compatible shapes. Broadcasting is a way to avoid looping. In short, if a PyTorch operation supports broadcasting, then its Tensor arguments can be automatically expanded to be of equal sizes (without making copies of the data).
In the simplest case, two tensors have the same shapes. This is the case for the matrix x @ W
and the bias vector b
in the linear model below:
= torch.rand(1, 2)
x = torch.rand(2, 3)
W = torch.rand(1, 3)
b = x @ W # shape: [1, 3]
z = z + b # shape: [1, 3]
z print(z)
z.shape
Now suppose that we have a whole batch of inputs. Watch what happens when adding the bias vector b
:
= torch.rand(5, 2)
X = X @ W # shape: [5, 3]
Z = Z + b # shape: [5, 3] Broadcasting happens here!
Z print(Z)
Z.shape
In the example, broadcasting expands the shape of b
from \([1, 3]\) into \([5, 3]\). The matrix Z
is formed by effectively adding b
to each row of X
. However, this is not implemented by a Python loop but happens implicitly through broadcasting.
PyTorch uses the same broadcasting semantics as NumPy. More information about broadcasting
To be expanded!