Are these operations fundamentally different?
reshape
tries to return a view
if possible, otherwise copies to data to a contiguous tensor and returns the view on it. From the docs:
Returns a tensor with the same data and number of elements as
input
, but with the specified shape. When possible, the returned tensor will be a view ofinput
. Otherwise, it will be a copy. Contiguous inputs and inputs with compatible strides can be reshaped without copying, but you should not depend on the copying vs. viewing behavior.
Seetorch.Tensor.view()
on when it is possible to return a view.
A single dimension may be -1, in which case it’s inferred from the remaining dimensions and the number of elements ininput
.
Have a look at this example to demonstrate this behavior:
x = torch.arange(4*10*2).view(4, 10, 2)
y = x.permute(2, 0, 1)
# View works on contiguous tensors
print(x.is_contiguous())
print(x.view(-1))
# Reshape works on non-contugous tensors (contiguous() + view)
print(y.is_contiguous())
try:
print(y.view(-1))
except RuntimeError as e:
print(e)
print(y.reshape(-1))
print(y.contiguous().view(-1))
permute
is quite different to view
and reshape
:
# View vs. permute
x = torch.arange(2*4).view(2, 4)
print(x.view(4, 2))
> tensor([[0, 1],
[2, 3],
[4, 5],
[6, 7]])
print(x.permute(1, 0))
> tensor([[0, 4],
[1, 5],
[2, 6],
[3, 7]])
- so when training a model, is it best to use view?
- Is using reshape a source of possible bugs in terms of gradient flow?
Anything to be careful otherwise as well?
Thanks!
-
Not necessarily. The usage of
view
andreshape
does not depend on training / not-training.
I personally useview
whenever possible and add acontiguous
call to it, if necessary. This will make sure I see, where a copy is done in my code.reshape
on the other hand does this automatically, so your code might look cleaner. -
No, it should not be a bug regarding the gradient flow. However, as shown with the
reshape
vs.permute
example, the wrong operator might of course cause problems in your training. E.g. if you would like to swap some axes of an image tensor from NCHW to NHWC, you should usepermute
.