I have quick question about weight sharing/tying. Suppose I have two Linear modules in an encoder-decoder framework:
layer_e = torch.nn.Linear(20, 50)
layer_d = torch.nn.Linear(50, 20)
And I wish for the weights of the two modules to be tied. How would I go bout doing this? Specifically, the weight of layer_e and layer_d must be tied for both initialization and backpropagation. So after training the entire framework, the weights of layer_e and layer_d must still be the same.
Previous posts about potential solution to this problem seems to have some flaws. For example:
-
layer_d.weights = layer_e.weights.T
This does not work, as transpose of any variant (.T, .t(), .transpose(0, 1)) all changes the weights from Parameter class into a Tensor. This assignment raises an error. -
layer_d.weights = torch.nn.parameter.Parameter(layer_e.weights.T)
This method creates an entirely new set of parameters for layer_d. While the initial value is a copy of the layer_e.weights. It is not tied in backpropagation, so layer_d.weights and layer_e.weights will be different after training. -
layer_d = torch.nn.functional.linear(input, layer_e.weights.T)
This reassigns the entire layer_d. This may work if layer_e.weight.T is returning the original weights. However, this changes the layer_d from a Module to a function, which is really inconvenient when considered with respect to the existing codebase.
Any help is appreciated.