I have implemented a convolutional autoencoder that perfectly works without weight sharing among encoder and decoder. I guess you all know how a conv. autoencoder works.
When tieing weights of the decoder to the encoder, i have noticed a weird behaviour of the weights of a standard nn.Conv2d:
For my case the input ist self.conv1 = nn.Conv2d(1,100,(16,5),stride=(16,5),padding=0), the auto-initialized weights for this layer are of size [100,1,16,5].
For the deconv I should use the the functional library with the transpose of these weights, right? This is the mathematically correct way to share weights. So what i would do looks like this
F.conv_transpose2d(out, weight=self.conv1.weight.transpose(0,1), bias=None, stride=(16,5),padding=0)
this throws an error, if I don’t transpose the weights in the conv_transpose2d it doesn’t throw an error.
So, this one works F.conv_transpose2d(out, weight=self.conv1.weight, bias=None,stride=(16,5),padding=0)
This seems like a weird behaviour (and maybe leads to errors in the future), especially because for fully connected (linear) layers it exactly works the way i would expect it to work.
I was interested in a similar question: what dimensions should be transposed when sharing weights for deconvolution ? is transposing filters redundant ?
I report the same error as you with the use of nn.functional.conv_transpose2d():
class AE_tied_weights(nn.Module):
def __init__(self, input_dim=28*28, hidden_layers=(), output_dim=2000):
super(AE_tied_weights, self).__init__()
self.encoder = nn.ModuleList([nn.Conv2d(1, 64, 4, 2, 1)])
self.bias = nn.ParameterList([nn.Parameter(torch.randn(1))])
def forward(self, x):
h = self.encoder[0](x)
h = torch.sigmoid(h)
y = nn.functional.conv_transpose2d(h, weight=self.encoder[0].weight.transpose(0, 1), bias=self.bias[0], stride=2, padding=1)
return x, h, y
transposing weight=self.encoder[0].weight.transpose(0, 1) gives the following error:
y = nn.functional.conv_transpose2d(h, weight=self.encoder[0].weight.transpose(0, 1), bias=self.bias[0], stride=2, padding=1)
RuntimeError: Given transposed=1, weight[1, 64, 4, 4], so expected input[60, 64, 14, 14] to have 1 channels, but got 64 channels instead
Which I don’t find very consistent with my previous feed back…
Working on similar problems associated with disentangled representations, and ran across this thread. If I am not mistaken, you are effectively transposing the weights when you swap the in and out channel counts.