I have an CNN-based autoencoder setup that looks like this (showing only the Conv/ConvTranspose layers; 1d layers since I work with text):
Encoder(
(conv_layers): ModuleList(
(0): Conv1d(300, 300, kernel_size=(5,), stride=(2,))
(1): Conv1d(300, 600, kernel_size=(5,), stride=(2,))
(2): Conv1d(600, 500, kernel_size=(10,), stride=(2,))
)
...
)
Decoder(
(deconv_layers): ModuleList(
(0): ConvTranspose1d(500, 600, kernel_size=(10,), stride=(2,))
(1): ConvTranspose1d(600, 300, kernel_size=(5,), stride=(2,))
(2): ConvTranspose1d(300, 300, kernel_size=(5,), stride=(2,))
)
)
This works already quite fine. Now I read about tying the weights of the respective Conv/ConvTranspose layers. To this end, I’ve added to following to the decoder:
deconv_layers[0].weight = conv_layers[2].weight
deconv_layers[1].weight = conv_layers[1].weight
deconv_layers[2].weight = conv_layers[0].weight
Which seems to work – that is, it trains alright and the respective weight matrices are always identical. I check during training using, e.g.:
torch.all(torch.eq(encoder.conv_layers[0].weight, decoder.deconv_layers[2]))
which is always 1 (True). However, I wonder if this is the best or even the right way to do:
- My current solution does not reduce the number of parameters. I assume that to correct way would be to define the 3 weight matrices first and given them to the encoder and decoder. I just don’t know how this would look like in proper code so that backprop works correctly.
- Even if I don’t reduce the number of parameters, is my approach at least in principle correct. Technically, the encoder and decoder updates the
conv_layers
anddeconv_layers
independently. But given that the respective weights are identical, I guess this so no problem.