Are these operations fundamentally different?
reshape tries to return a view if possible, otherwise copies to data to a contiguous tensor and returns the view on it. From the docs:
Returns a tensor with the same data and number of elements as
input, but with the specified shape. When possible, the returned tensor will be a view ofinput. Otherwise, it will be a copy. Contiguous inputs and inputs with compatible strides can be reshaped without copying, but you should not depend on the copying vs. viewing behavior.
Seetorch.Tensor.view()on when it is possible to return a view.
A single dimension may be -1, in which case it’s inferred from the remaining dimensions and the number of elements ininput.
Have a look at this example to demonstrate this behavior:
x = torch.arange(4*10*2).view(4, 10, 2)
y = x.permute(2, 0, 1)
# View works on contiguous tensors
print(x.is_contiguous())
print(x.view(-1))
# Reshape works on non-contugous tensors (contiguous() + view)
print(y.is_contiguous())
try:
print(y.view(-1))
except RuntimeError as e:
print(e)
print(y.reshape(-1))
print(y.contiguous().view(-1))
permute is quite different to view and reshape:
# View vs. permute
x = torch.arange(2*4).view(2, 4)
print(x.view(4, 2))
> tensor([[0, 1],
[2, 3],
[4, 5],
[6, 7]])
print(x.permute(1, 0))
> tensor([[0, 4],
[1, 5],
[2, 6],
[3, 7]])
- so when training a model, is it best to use view?
- Is using reshape a source of possible bugs in terms of gradient flow?
Anything to be careful otherwise as well?
Thanks!
-
Not necessarily. The usage of
viewandreshapedoes not depend on training / not-training.
I personally useviewwhenever possible and add acontiguouscall to it, if necessary. This will make sure I see, where a copy is done in my code.reshapeon the other hand does this automatically, so your code might look cleaner. -
No, it should not be a bug regarding the gradient flow. However, as shown with the
reshapevs.permuteexample, the wrong operator might of course cause problems in your training. E.g. if you would like to swap some axes of an image tensor from NCHW to NHWC, you should usepermute.