How to share weights between two layers?

Shisho_Sama · September 9, 2019, 12:47pm

Hello everyone, hope you are having a great time.
I wanted to create an autoecoder. a simple one. if my memory serves me correctly, back in the day, one way to create an autoencoder was to share weights between encoder and decoder. that is, the decoder was simply using the transpose of the encoder. aside from the practicality of this and whether or not this was for the best or the worst, can you please help me do this?
based on this discussion, I tried doing :
self.decoder[0].weight = self.encoder[0].weight.t()
and this wont work. and I get :

TypeError : cannot assign ‘torch.FloatTensor’ as parameter ‘weight’ (torch.nn.Parameter or None expected)

So I ended up doing:

class AutoEncoder(nn.Module):
    def __init__(self, embeddingsize=40):
        super().__init__()
        self.encoder = nn.Sequential(nn.Linear(28*28, embeddingsize),
                                     nn.Tanh())
        self.decoder = nn.Sequential(nn.Linear(embeddingsize, 28*28),
                                     nn.Sigmoid())
       self.decoder[0].weight = nn.Parameter(self.encoder[0].weight.t())

    def forward(self, input):
           ....
           return output

the network trains and I get no errors, but I’m not sure if it uses the very same weights for both of them or the initial weights are simply used as initial values and nn.Parameter() simply creates a brand new weights for the decoder!

Any helps in this regard is greatly appreciated and Thanks a lot in advance

InnovArul · September 10, 2019, 3:48am

I guess, you have already read the answer.
I am linking the post just for completion!

Shisho_Sama · September 11, 2019, 5:26pm

Thanks a lot, It would be much better if you copy/paste all of those codes here as well. its kind of hard to be redirected to another website.
Anyway I have some questions as well.
Why cant we simply do :
self.decoder[0].weight = self.encoder[0].weight.t()
and instead we must do :
self.decoder[0].weight.data = self.encoder[0].weight.data.transpose(0,1)
doesn’t data only copy the raw values from the source to the destination (i.e encoder to decoder)?
so basically this would be two different weights that happen to have the same initial values! and in backprop they just get tuned independently. (please see the images below)
my second question is, why cant we use .t() instead of transpose(0,1) aren’t they interchangeable?

Also I noticed, simply doing :

weights = nn.Parameter(torch.randn_like(self.encoder[0].weight))
self.encoder[0].weight.data = weights.clone()
self.decoder[0].weight.data = self.encoder[0].weight.data.transpose(0, 1)

results in different weight visualization. when I tried to visualize both weights, they look just different!

Update:
After transposing the decoders weight and visualizing it, it turns out, they are identical ( decoders weight visualization is a bit washed out though, but they are indeed look alike)

Joshua_Clancy · September 13, 2019, 6:11pm

I have been working on a similar problem… after following @InnovArul code in the thread already linked, I was able to get it to work. Though I am unsure how the different methods effect the end result… they seem to all work, is it just a matter of speed? I also have another related question, what if I wish to tie weights between layers that are within different classes or modules. Would I return the weight data, and then pass in the weight data into the other classes? Would this still allow me to tie the weights?

Edit: Of course after staring at @InnovArul 's code for an hour… it is straight after I post this question that I figure out what he is doing regarding passing weight data through classes and so figure out my own question. I still wonder about the difference between the three methods though.