So I’m building an autoencoder structure using GRUs, and it currently works with 1 layer, but I am trying to make it work with many layers.

Currently, my input is a tensor of size (no. of batches * length of sequence * no. of features). The first two are variable, but for my data, I have only 3 features (I know this is small). So for the sake of an example, let’s say my input is (100 * 250 * 3.)

Looking at the seq2seq translation tutorial, the common step in NLP is to create an embedding layer in the encoder, which takes the input size and outputs in the hidden size, this can then be fed through the GRU for any number of layers as the input is always the same as the output. However, I’m not sure an embedding is particularly right for my application as my inputs are just 3 numeric time series, so I’m left in a situation where my initial input to the GRU is (100 * 250 * 3), it goes through the GRU and comes out as a (100 * 250 * hidden size) tensor, which I can no longer put back through the GRU because it is expecting a (i * j * 3) tensor.

Any ideas on how I should approach this (code below).

```
def forward(self, input_value, hidden):
recursive_input = input_value
for i in range(self.n_layers):
recursive_input, hidden = self.enc(recursive_input, hidden)
for i in range(self.n_layers):
recursive_input = F.relu(recursive_input)
recursive_input, hidden = self.dec(recursive_input, hidden)
seq_output = self.out(recursive_input)
return (seq_output, recursive_input, hidden, None)
```

whereby

```
self.enc = nn.GRU(input_size, hidden_size, batch_first=True)
self.dec = nn.GRU(hidden_size, hidden_size, batch_first=True)
self.out = nn.Linear(hidden_size, output_size)
```