How is a Conv1d with groups=1 different from a Linear layer?

shamoons · October 24, 2020, 5:44pm

If I have:

self.layer1 = torch.nn.Conv1d(in_channels=512, out_channels=512, kernel_size=1)

isn’t that equivalent to

self.layer1 = torch.nn.Linear(512, 512)

?

ptrblck · October 25, 2020, 1:44am

Yes, should be the case:

# Setup
conv = torch.nn.Conv1d(in_channels=512, out_channels=512, kernel_size=1).double()
lin = torch.nn.Linear(512, 512).double()

# use same param values
with torch.no_grad():
    lin.weight = nn.Parameter(conv.weight.squeeze(2))
    lin.bias = nn.Parameter(conv.bias)

# forward
x = torch.randn(2, 512, 20).double()
out_conv = conv(x)

# permute for linear
x_lin = x.permute(0, 2, 1)
out_lin = lin(x_lin)

# check forward output
print(torch.allclose(out_lin.permute(0, 2, 1), out_conv))
> True

print((out_lin.permute(0, 2, 1) - out_conv).abs().max())
> tensor(1.2212e-15, dtype=torch.float64, grad_fn=<MaxBackward1>)

# check backward
out_conv.mean().backward()
out_lin.mean().backward()
print(torch.allclose(conv.weight.grad.squeeze(2), lin.weight.grad))
> True
print(torch.allclose(conv.bias.grad, lin.bias.grad))
> True

shamoons · October 25, 2020, 12:42pm

Thanks so much. So there’s literally no difference, not even in terms of computation?

ptrblck · October 26, 2020, 1:09am

There is most likely a difference in computation in particular if you are using CUDA operations. E.g. convolutions would be dispatched to cudnn, if you are using an NVIDIA GPU, which could internally call into cublas (same as in the linear layer), but isn’t guaranteed.
I don’t know, which methods are exactly called on the CPU.

For my code snippet the convolution would use cudnn::cnn::implicit_convolve_dgemm, while the linear layer would call into volta_dgemm_128x64_tn.