This post explains why we sometimes get runtime errors when creating views of tensors.
Author
Marco Kuhlmann
Published
January 19, 2024
Why does the following code raise a runtime error, and what does that error mean?
torch.randn(2, 3, 2).permute(0, 2, 1).view(-1)
---------------------------------------------------------------------------RuntimeError Traceback (most recent call last)
Cell In[2], line 1----> 1torch.randn(2,3,2).permute(0,2,1).view(-1)RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
Shape and stride
A tensor is a type of multi-dimensional array. Like an array, it points to data, such as floating-point numbers or integers. Its dimensions are specified by its shape. For example, the following code creates a three-dimensional tensor containing 12 random numbers:
x = torch.randn(2, 3, 2)x.shape
torch.Size([2, 3, 2])
Tensor data is stored in memory as a linear sequence. To map between positions in this one-dimensional storage and indexes in the multi-dimensional array, each tensor comes with a recipe called its stride. The stride specifies how many steps we need to go in the linear storage to get from one element to the next one in each dimension.
To make this concrete, let’s inspect the stride for x:
x.stride()
(6, 2, 1)
This stride specifies that we need to take 6 steps to go from one element to the next one in dimension 0, 2 steps to do the same in dimension 1, and 1 step to proceed in dimension 2.
For example, suppose we want to retrieve the element of x at the multi-dimensional index \((1, 2, 0)\):
x[1, 2, 0].item()
-2.065706253051758
Looking at the stride of x, we can locate this element in the linear storage as follows. We need to take \(1 \times 6 = 6\) steps to go to index \(1\) in dimension 0, then \(2 \times 2 = 4\) steps to go to index \(2\) in dimension 1, and finally another \(0 \times 1 = 0\) steps to go to index \(1\) in dimension 2. The total number of steps is \(10\), so we find our element at position 10 in the linear storage. Let’s check that this is actually correct:
assert x[1, 2, 0] == x.storage()[1*6+2*2+0*1]
/var/folders/8c/3bwz93616zb63m8kpc005s_n1hdglx/T/ipykernel_37499/1586666880.py:1: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
assert x[1, 2, 0] == x.storage()[1 * 6 + 2 * 2 + 0 * 1]
More generally, we can translate a multi-dimensional index into a storage position using the following code:
def idx_to_pos(a, idx):returnsum(i * s for i, s inzip(idx, a.stride()))
With this code, we can rewrite our check as follows:
There are situations where we want to change the shape of a tensor. For example, suppose that x is a batch consisting of two three-word sentences, where each word is represented by a 2-dimensional embedding vector. Some model may require us to represent each sentence as a single 6-dimensional vector, formed by concatenating the constituent word vectors. We can achieve this using a view:
y = x.view(2, 6)y.shape
torch.Size([2, 6])
A view of a tensor x is a tensor y that points to the same data as x but has a different shape. Because x and y share the same storage, creating a view is cheap: the only information that needs to be updated is meta-data in the form of the shape and the stride. The data in the storage is left untouched.
How can we translate indexes in x into indexes in y? For example, how do we find the element at index \((1, 2, 0)\) from x in y? Because the view took us from three 2-dimensional vectors to one 6-dimensional vector, we should expect it at index \((1, 4)\). Let’s check:
y[1, 4].item()
-2.065706253051758
Now, while the multi-dimensional index of an element changes when creating views, its position in the linear storage does not change. Indeed, our code for translating an index to a linear position tells us that the element has not moved:
idx_to_pos(y, (1, 4))
10
The explanation for this is that reshaping a tensor does not only change its shape; it also changes its stride:
In the permuted version of x, we need to take 2 steps to go from one element to the next one in dimension 2. This means that, to collect all elements in one of the two-dimensional vectors in x, we need to skip every other element in the linear storage. The tensor x_permuted is not contiguous:
x_permuted.is_contiguous()
False
A contiguous tensor is one in which striding past all elements in a given dimension always brings us to the same position in the linear storage that we would have reached, had we instead taken one step according to the stride for the next-lower dimension. The following code tests whether this property holds for a given tensor a:
def is_contiguous(a): expected_stride =1for x, y inreversed(tuple(zip(a.shape, a.stride()))):if y != expected_stride:returnFalse expected_stride *= xreturnTrue
For x, stepping past the 2 elements of dimension 2 (stride 1) brings us to the linear position 2 – the same position we get to when taking one step along dimension 1 (stride 2):
For x_permuted, stepping past the 3 elements of dimension 2 (stride 2) brings us to the linear position 6. Had we instead taken one step along dimension 1 (stride 1), we would have landed at position 1.
We can now explain why we got the runtime error at the beginning of the post. Here is the code again:
torch.randn(2, 3, 2).permute(0, 2, 1).view(-1)
---------------------------------------------------------------------------RuntimeError Traceback (most recent call last)
Cell In[18], line 1----> 1torch.randn(2,3,2).permute(0,2,1).view(-1)RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
When we create a one-dimensional view by calling .view(-1) we get a one-dimensional tensor whose stride is the singleton tuple (1,). This stride tells us that we can go from one element to the next one by taking 1 step in the linear storage. However, we just saw that this property does not hold after the permutation .permute(0, 2, 1). Taking the two views in this sequence leads to an inconsistency, and this inconsistency is what is reported through the runtime error.
The runtime error advises us to use the reshape() method instead of view() to avoid the inconsistency between tensor shape and stride.
z = torch.randn(2, 3, 2).permute(0, 2, 1).reshape(-1)
This lets us indeed get rid of the runtime error; but calling this method takes a copy of the old tensor, including its data. For large tensors, this can be expensive.
Note that we can also safeguard us against stride inconsistencies by explicitly converting a potentially non-contiguous tensor into a contiguous one using the contiguous() method:
z = torch.randn(2, 3, 2).permute(0, 2, 1).contiguous().view(-1)
The reshape() method is essentially a convenience method: It first checks whether a tensor is contiguous and then either returns a view (if it is) or converts the tensor into a contiguous copy by calling contiguous().