Understanding tying of weights (CNN autoencoder)

vdw · May 2, 2019, 9:57am

I have an CNN-based autoencoder setup that looks like this (showing only the Conv/ConvTranspose layers; 1d layers since I work with text):

Encoder(
  (conv_layers): ModuleList(
     (0): Conv1d(300, 300, kernel_size=(5,), stride=(2,))
     (1): Conv1d(300, 600, kernel_size=(5,), stride=(2,))
     (2): Conv1d(600, 500, kernel_size=(10,), stride=(2,))
  )
  ...
)
Decoder(
  (deconv_layers): ModuleList(
    (0): ConvTranspose1d(500, 600, kernel_size=(10,), stride=(2,))
    (1): ConvTranspose1d(600, 300, kernel_size=(5,), stride=(2,))
    (2): ConvTranspose1d(300, 300, kernel_size=(5,), stride=(2,))
  )
)

This works already quite fine. Now I read about tying the weights of the respective Conv/ConvTranspose layers. To this end, I’ve added to following to the decoder:

deconv_layers[0].weight = conv_layers[2].weight
deconv_layers[1].weight = conv_layers[1].weight
deconv_layers[2].weight = conv_layers[0].weight

Which seems to work – that is, it trains alright and the respective weight matrices are always identical. I check during training using, e.g.:

torch.all(torch.eq(encoder.conv_layers[0].weight, decoder.deconv_layers[2]))

which is always 1 (True). However, I wonder if this is the best or even the right way to do:

My current solution does not reduce the number of parameters. I assume that to correct way would be to define the 3 weight matrices first and given them to the encoder and decoder. I just don’t know how this would look like in proper code so that backprop works correctly.
Even if I don’t reduce the number of parameters, is my approach at least in principle correct. Technically, the encoder and decoder updates the conv_layers and deconv_layers independently. But given that the respective weights are identical, I guess this so no problem.