Difference between view, reshape and permute

Rafael_R · August 23, 2019, 4:55pm

Are these operations fundamentally different?

ptrblck · August 23, 2019, 5:49pm

reshape tries to return a view if possible, otherwise copies to data to a contiguous tensor and returns the view on it. From the docs:

Returns a tensor with the same data and number of elements as input , but with the specified shape. When possible, the returned tensor will be a view of input . Otherwise, it will be a copy. Contiguous inputs and inputs with compatible strides can be reshaped without copying, but you should not depend on the copying vs. viewing behavior.
See torch.Tensor.view() on when it is possible to return a view.
A single dimension may be -1, in which case it’s inferred from the remaining dimensions and the number of elements in input .

Have a look at this example to demonstrate this behavior:

x = torch.arange(4*10*2).view(4, 10, 2)
y = x.permute(2, 0, 1)

# View works on contiguous tensors
print(x.is_contiguous())
print(x.view(-1))

# Reshape works on non-contugous tensors (contiguous() + view)
print(y.is_contiguous())
try: 
    print(y.view(-1))
except RuntimeError as e:
    print(e)
print(y.reshape(-1))
print(y.contiguous().view(-1))

permute is quite different to view and reshape:

# View vs. permute
x = torch.arange(2*4).view(2, 4)
print(x.view(4, 2))
> tensor([[0, 1],
          [2, 3],
          [4, 5],
          [6, 7]])
print(x.permute(1, 0))
> tensor([[0, 4],
          [1, 5],
          [2, 6],
          [3, 7]])

Rafael_R · August 28, 2019, 5:00am

so when training a model, is it best to use view?
Is using reshape a source of possible bugs in terms of gradient flow?

Anything to be careful otherwise as well?

Thanks!

ptrblck · August 28, 2019, 10:28am

Not necessarily. The usage of view and reshape does not depend on training / not-training.
I personally use view whenever possible and add a contiguous call to it, if necessary. This will make sure I see, where a copy is done in my code. reshape on the other hand does this automatically, so your code might look cleaner.
No, it should not be a bug regarding the gradient flow. However, as shown with the reshape vs. permute example, the wrong operator might of course cause problems in your training. E.g. if you would like to swap some axes of an image tensor from NCHW to NHWC, you should use permute.