Problems Training a Seq2Seq autoencoder

I’ve been trying to build a simple Seq2Seq autoencoder with GRUs. For some reason, the loss goes down but when I test it on new input or even input from the training set, it outputs another part of the training set instead of the input. I think it’s memorizing the train set but I’m not sure why if it only has to pass the input through as the output. Here is my code I’m using, I’ve tried to slim it down as much as possible.

Model File:
Training File (The one that gets ran):
Data Preperation File (Helper file to handle data):

That the output is another sentence from the training data is very odd. Something garbled would just mean that the training fails somewhere. But you really get a proper sentences as output, just not the input one.

Did you try to iteratively build the model. Any Seq2Seq model for machine translation can be used as an autoencoder, only that input and target are the same sequence. So one can start with the basic PyTorch Seq2Seq tutorial. If this is training fine, one can introduce additional bottelneck layers between the encoder and decoder to see how it’s affecting the training – or even extend it to a Variational Autoencoder.

Can you overtrain the model in a small training dataset?

Well, I had a quick look at your code, but it’s far from a minimal example :). Here are just some things I’ve noticed:

  • In the line hidden = hidden.view(self.n_layers, 1, input_seq.shape[1], -1)[-1], why is the second parameter a 1? This should be 2 since it represents the number of directions and you use a bidirectional GRU.

  • Why do you need the encoder output encoded in for the decoder? What is the decoder_input = torch.LongTensor([1 for i in range(encoded.shape[1])])... doing. Usually you have to initialize the target sequence with a special start token, e.g., input = torch.LongTensor([[Token.SOS]] * batch_size).to(self.device)

You may want to have a look at my code for an Autoencoder and Variational Autoencoder. I started with the PyTorch Seq2Seq tutorial, extended it to the AE, and then to the VAE.

1 Like

First off thank you for the great answer. I am very confused as well because not only are the results not garbled and coherent, but they are exact sentences from other examples. I tried to overtrain the model, and it works on a tiny dataset of 5 examples but when I bring it up to 10 examples the same effect starts to occur.

As for your first question, I want to concatenate the hidden states together from both directions and pass them through a linear to scale them down to one hidden state size to pass to the decoder.

For the second question, I don’t need the encoder output and I don’t use it. The decoder_input is set initially to be the start token * the batch size, though your way is more elegant. I’m pretty sure it does the same thing though.

I am in the process of going through your example, will let you know how it goes.

Thanks for all the help!

I’ve finally got it working, and all I needed was to change the encoder forward function. Turns out the problem was in fact the line where I was trying to concatenate the hidden state of the encoder. I’ve replaced it with your flatten function and it works great. Thanks again!