Scaling an RNN encoder-decoder to multiple layers

jakecowton · August 24, 2017, 5:36pm

So I’m building an autoencoder structure using GRUs, and it currently works with 1 layer, but I am trying to make it work with many layers.

Currently, my input is a tensor of size (no. of batches * length of sequence * no. of features). The first two are variable, but for my data, I have only 3 features (I know this is small). So for the sake of an example, let’s say my input is (100 * 250 * 3.)

Looking at the seq2seq translation tutorial, the common step in NLP is to create an embedding layer in the encoder, which takes the input size and outputs in the hidden size, this can then be fed through the GRU for any number of layers as the input is always the same as the output. However, I’m not sure an embedding is particularly right for my application as my inputs are just 3 numeric time series, so I’m left in a situation where my initial input to the GRU is (100 * 250 * 3), it goes through the GRU and comes out as a (100 * 250 * hidden size) tensor, which I can no longer put back through the GRU because it is expecting a (i * j * 3) tensor.

Any ideas on how I should approach this (code below).

def forward(self, input_value, hidden):
    recursive_input = input_value
    for i in range(self.n_layers):
        recursive_input, hidden = self.enc(recursive_input, hidden)

    for i in range(self.n_layers):
        recursive_input = F.relu(recursive_input)
        recursive_input, hidden = self.dec(recursive_input, hidden)

    seq_output = self.out(recursive_input)
    return (seq_output, recursive_input, hidden, None)

whereby

self.enc = nn.GRU(input_size, hidden_size, batch_first=True)
self.dec = nn.GRU(hidden_size, hidden_size, batch_first=True)
self.out = nn.Linear(hidden_size, output_size)

smth · August 24, 2017, 5:53pm

you could do a reprojection, either by adding a Linear layer that maps the hidden_size back to 3, or by doing AvgPooling for example.

jakecowton · August 24, 2017, 6:47pm

I toyed with the idea of reprojection for a while. I’m trying it with setting num_layers in GRU() for the encoder first, and if that fails, I’ll just using an additional linear layer after each encoder GRU layer. Thanks!

def forward(self, input_value, hidden):
    recursive_input = input_value
    recursive_input, hidden = self.enc(recursive_input, hidden)

    hidden = hidden[-1].view(1, hidden.size(1), hidden.size(2))
    for i in range(self.n_layers):
        recursive_input = F.relu(recursive_input)
        recursive_input, hidden = self.dec(recursive_input, hidden)

    seq_output = self.out(recursive_input)
    return (seq_output, recursive_input, hidden, None)