LSTM autoencoder architecture

I am trying to create a simple LSTM autoencoder.

More precisely I want to take a sequence of vectors, each of size input_dim, and produce an embedded representation of size latent_dim via an LSTM. From this I would like to decode this embedded representation via another LSTM, (hopefully) reproducing the input series of vectors.

Here is my definition for the encoder and decoder

self.encoder = nn.LSTM(input_dim, latent_dim)
self.encoder_hidden = (autograd.Variable(torch.zeros(1, 1, self.latent_dim)),
autograd.Variable(torch.zeros(1, 1, self.latent_dim)))

self.decoder = nn.LSTM(latent_dim, input_dim)`
self.decoder_hidden = (autograd.Variable(torch.zeros(1, 1, self.input_dim)),
autograd.Variable(torch.zeros(1, 1, self.input_dim)))

The encoding step seems to make sense. For each vector in the input sequence compute the encoding and hidden state, passing the hidden state along to the next call to the LSTM.

Here is the encode step:

def encode(self, word_vectors):
    out = None
    for word_vec in word_vectors:
        out, self.encoder_hidden = self.encoder(word_vec.view(1, 1, -1),
                                                self.encoder_hidden)
    return out

The decoding step however, I do not understand. I have seen references in various posts/docs saying that the last hidden state of the encoding LSTM should be the first hidden state of the decoding LSTM. However this can not be the case as the hidden states are of different dimensions (the size of the hidden and cell state must be the same as the output for an LSTM, it seems).

Therefore I have been trying to use the output of the encoding LSTM as the first input into the decoding LSTM. This works for the first step, but then there is nothing of latent_dim to use past the first iteration.

Here is my decode step:

def decode(self, encoded, target_length):
    outputs = []
    out = encoded
    for i in range(target_length):
        out, self.decoder_hidden = self.decoder(out.view(1, 1, -1),
                                                self.decoder_hidden)
        outputs.append(out)
    return outputs

This will fail at i=1 because out is of the wrong dimenstion.

Anybody have any ideas how I could go about this?

Hi,

I’ve just created a LSTM autoencoder. I don’t know whether it’s correct but it seems to work.

You can take a look.

class EncoderRNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, isCuda):
        super(EncoderRNN, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        
        self.isCuda = isCuda
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        self.relu = nn.ReLU()
        
        #initialize weights
        nn.init.xavier_uniform(self.lstm.weight_ih_l0, gain=np.sqrt(2))
        nn.init.xavier_uniform(self.lstm.weight_hh_l0, gain=np.sqrt(2))

    def forward(self, input):
        tt = torch.cuda if self.isCuda else torch
        h0 = Variable(tt.FloatTensor(self.num_layers, input.size(0), self.hidden_size))
        c0 = Variable(tt.FloatTensor(self.num_layers, input.size(0), self.hidden_size))
        encoded_input, hidden = self.lstm(input, (h0, c0))
        encoded_input = self.relu(encoded_input)
        return encoded_input

class DecoderRNN(nn.Module):
    def __init__(self, hidden_size, output_size, num_layers, isCuda):
        super(DecoderRNN, self).__init__()
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.num_layers = num_layers
        
        self.isCuda = isCuda
        self.lstm = nn.LSTM(hidden_size, output_size, num_layers, batch_first=True)
        #self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()
        
        #initialize weights
        nn.init.xavier_uniform(self.lstm.weight_ih_l0, gain=np.sqrt(2))
        nn.init.xavier_uniform(self.lstm.weight_hh_l0, gain=np.sqrt(2))
        
    def forward(self, encoded_input):
        tt = torch.cuda if self.isCuda else torch
        h0 = Variable(tt.FloatTensor(self.num_layers, encoded_input.size(0), self.output_size))
        c0 = Variable(tt.FloatTensor(self.num_layers, encoded_input.size(0), self.output_size))
        decoded_output, hidden = self.lstm(encoded_input, (h0, c0))
        decoded_output = self.sigmoid(decoded_output)
        return decoded_output

class LSTMAE(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, isCuda):
        super(LSTMAE, self).__init__()
        self.encoder = EncoderRNN(input_size, hidden_size, num_layers, isCuda)
        self.decoder = DecoderRNN(hidden_size, input_size, num_layers, isCuda)
        
    def forward(self, input):
        encoded_input = self.encoder(input)
        decoded_output = self.decoder(encoded_input)
        return decoded_output
6 Likes

Hi, this architecture seems slightly off. Don’t you have to use just the last hidden state of the Encoder as you context vector? Or is this a different type of Autoencoder where multiple hidden states are given to the decoder?

I used the above code and failed to get successful results. Shouldn’t we be using just one vector as final embedding for the LSTM? The above code feeds all the outputs from the encoder at different time stamps for the decoder

The last hidden state of the encoder (which is the latent representation of the input) should be the input to the decoder network.

1 Like

Hi, please, can you give me an example of initialization of a new LSTMAE? I’m quite new to this field and I’m not able to understand what type of parameter requires your script. I tried with
encoder = LSTMAE((19, 21), (10, 10), 2, True)
but got an error, I think because of the hidden_size parameter.
Thank you