LSTM autoencoder implementation

I am implementing LSTM autoencoder which is similar to the paper by Srivastava et. al (‘Unsupervised Learning of Video Representations using LSTMs’).

In the above figure, the weights in the LSTM encoder is copied to those of the LSTM decoder.

To implement this, is the encoder weights cloned to the decoder ?

More specifically, is the snippet blow correct ?

class Sequence(nn.Module):
def __init__(self):
    super(Sequence, self).__init__()
    self.lstm_enc = nn.LSTMCell(1, hidden_size)
    self.fc_enc = nn.Linear(hidden_size,1)
    self.lstm_dec = nn.LSTMCell(1, hidden_size)
    self.fc_dec = nn.Linear(hidden_size,1)

def forward(self, input, input_r):
    outputs = []
    
    h_t_enc = Variable(torch.zeros(input.size(0), hidden_size).cuda(), requires_grad=False)
    c_t_enc = Variable(torch.zeros(input.size(0), hidden_size).cuda(), requires_grad=False)

    # enc
    for i, input_t in enumerate(input.chunk(input.size(1), dim=1)):
        h_t_enc, c_t_enc = self.lstm_enc(input_t, (h_t_enc, c_t_enc))
        output = self.fc_enc(c_t3_enc)
    
    #dec
    h_t_dec = h_t_enc.clone()
    c_t_dec = c_t_enc.clone()
    
    outputs += [output]

    # note that input_r is the time-reverse version of input
    for i, input_t in enumerate(input_r.chunk(input_r.size(1), dim=1)):
        if i != input_r.size(1)-1:
            h_t_dec, c_t_dec = self.lstm1_dec(input_t, (h_t_dec, c_t_dec))
            output = self.fc_dec(c_t_dec)
            outputs += [output]
    
    outputs = torch.stack(outputs, 1).squeeze(2)

    return outputs

Do you mean the hidden activations?

Do you really have to clone them? It seems to me that you could directly pass your outputs to the decoder.

Yes, I mean the hidden activations.

Do you mean

h_t_dec = h_t_enc
c_t_dec = c_t_enc

?

Yes, or even directly:

h_t_dec, c_t_dec = self.lstm1_dec(input_t, (h_t_enc, c_t_enc))
1 Like

Thanks for your reply.

But, it seems to me that W1 at the encoder and W2 at the decoder is different.
If it is not, why do the authors use the different notation ?

I think in the paper W1 and W2 represent the operations applied on the (hidden, cell and inputs) activations, using the weights parameters of respectively LSTM_enc and LSTM_dec. These parameters are differents.

Do you mean that I should use

h_t_dec = h_t_enc
c_t_dec = c_t_enc

even if these parameters are different ?

Yes this example could be interpreted as in auto encoder. W1 or in this example C_t is passed through lstm1 and W2 or in this example C_t2 is passed through lstm2 through timesteps.

How you want to set this up though depends on what type of data your looking to use autoencoderwith model.

@Seungyoung_Park Hey! Did you get a chance to finish the implementation for this? Do you mind sharing the source code for that? I’m also working on a similar problem. :slight_smile: Thanks a lot.