You get the error in the title if you run the below code, which I believe should not happen. The step that “causes” the error is y.transpose(1,2) – for some reason pytorch doesn’t like the 1st and 2nd dimension being transposed here. but the underlying issue appears to be with how the bias is being added (if I do bias = False the error goes away.)
import torch
import torch.nn as nn
k = 2
w,h,d = 1, 1, 1
x = torch.zeros((10, 1, w, h, d))
c = torch.nn.ConvTranspose3d(1, 2, k)
y = c(x).contiguous()
y = y.reshape(1, 10, 2, k*w, k*h, k*d).contiguous()
y = y.transpose(1, 2).contiguous()
hub = nn.SmoothL1Loss()
l = hub(y, torch.zeros(y.shape))
l.backward()
you found a bug, and one, has been around for a long time (credit to @ptrblck who tested it all the way to PyTorch 1.4). Thank you so much for reporting it here with a minimal example, these are gold!
The CPU conv3d transpose backward expects the gradient to be contiguous and as the transpose’s backward is the same transpose on the gradient, it is not.
Here is a workaround using an autograd function that makes the gradient contiguous:
import torch
import torch.nn as nn
class ContiguousGrad(torch.autograd.Function):
@staticmethod
def forward(ctx, x):
return x
@staticmethod
def backward(ctx, grad_out):
return grad_out.contiguous()
k = 2
w,h,d = 1, 1, 1
x = torch.zeros((10, 1, w, h, d))
c = torch.nn.ConvTranspose3d(1, 2, k)
y = c(x)
y = ContiguousGrad.apply(y)
y = y.reshape(1, 10, 2, k*w, k*h, k*d)
y = y.transpose(1, 2)
hub = nn.SmoothL1Loss()
l = hub(y, torch.zeros(y.shape))
l.backward()
The real fix will be adding contiguous to the backward, but that will be in PyTorch 1.11, likely.