Implementing a deep autoencoder with tied weights

I’m trying to implement a deep Autoencoder in PyTorch where the encoder’s weights are tied to the decoder. Following the idea given here, [Autoencoder with tied weights using sequential() - #3 by TheOraware] on this forum, I came up with this:

class TiedAutoEncoder(nn.Module):
def __init__(self):
    super().__init__()
    self.w1 = nn.Parameter(torch.randn(100, 100))
    self.w2 = nn.Parameter(torch.randn(100, 100))
    self.w3 = nn.Parameter(torch.randn(100, 100))
    self.w4 = nn.Parameter(torch.randn(100, 100))
    self.w5 = nn.Parameter(torch.randn(100, 100))
    self.w6 = nn.Parameter(torch.randn(100, 100))
    self.w7 = nn.Parameter(torch.randn(100, 100))
    self.w8 = nn.Parameter(torch.randn(100, 100))

def forward(self, input):
    ### INPUT
    x = torch.tanh(F.linear(input, nn.Parameter(torch.randn(100, 39))))

    ### ENCODER
    x = torch.tanh(F.linear(x, self.w1))
    x = torch.tanh(F.linear(x, self.w2))
    x = torch.tanh(F.linear(x, self.w3))
    x = torch.tanh(F.linear(x, self.w4))
    x = torch.tanh(F.linear(x, self.w5))
    x = torch.tanh(F.linear(x, self.w6))
    x = torch.tanh(F.linear(x, self.w7))
    x = torch.tanh(F.linear(x, self.w8))

    ### FEATURE EXTRACTION LAYER
    fe = torch.tanh(F.linear(x, nn.Parameter(torch.randn(39, 100))))
    

    ### DECODER
    x = torch.tanh(F.linear(fe, nn.Parameter(torch.randn(100, 39))))
    x = torch.tanh(F.linear(x, self.w8.T))
    x = torch.tanh(F.linear(x, self.w7.T))
    x = torch.tanh(F.linear(x, self.w6.T))
    x = torch.tanh(F.linear(x, self.w5.T))
    x = torch.tanh(F.linear(x, self.w4.T))
    x = torch.tanh(F.linear(x, self.w3.T))
    x = torch.tanh(F.linear(x, self.w2.T))
    x = torch.tanh(F.linear(x, self.w1.T))

    ### OUTPUT
    out = F.linear(x, nn.Parameter(torch.randn(39, 100)))
    return out

I plan on training this autoencoder and then using the bottleneck layer (labelled feature extraction in the code) to extract features for my data. Basically I’ll remove the decoder part after training.

The problem is this model does not train; the loss remains stuck. I’ve tried increasing the data as well as using different activation functions. I think the problem is in how I’ve written the weight sharing, since the same model without tying does train.

Any ideas on what I could be doing wrong will be appreciated. Thank you!

(Autoencoder with tied weights using sequential() - #3 by TheOraware)

Three throughts:

  • To use nn.Parameter in the forward is almost certainly a mistake. Parameters should be defined in __init__ and then used in the forward.
  • I’m not sure that unit variance is a good initialization.
  • tanh is a classic activation function, but I’m not sure that it is widely used these days.