Hello,

I’ve written previously about how I was having trouble converting a TF/Keras model to Pytorch and getting the same results.

I didn’t get a response, so in order to simplify things I built this notebook:

https://www.kaggle.com/sdoria/rnn-with-toy-data?scriptVersionId=11719264

I have a dataset of 10 randomly generated (150 timesteps * 12 features) inputs, and their corresponding float outputs (this is a regression).

RNN Model is defined as below

```
class RNN(nn.Module):
def __init__(self, input_size = 12, hidden_size=48, num_layers=1, bidirectional=False):
super().__init__()
self.input_size = input_size
self.hidden_size = hidden_size
self.bidirectional,self.num_layers = bidirectional,num_layers
if bidirectional: self.num_directions = 2
else: self.num_directions = 1
self.rnn = nn.GRU(input_size, hidden_size, bidirectional=self.bidirectional, batch_first=True, num_layers = num_layers)
self.final_layers = nn.Sequential(
nn.Linear(self.num_directions * hidden_size,10),
nn.ReLU(),
nn.Linear(10,1),
)
def forward(self,input_seq):
output, h_n = self.rnn(input_seq)
output = output[:,-1,:]
output = self.final_layers(output)
return output
```

When trying to overfit my 10 datapoints, my training loss eventually gets stuck.

Model outputs are all very close to each other:

[tensor([[5.2029],

[5.2068],

[5.2099],

[5.2129],

[5.2188],

[5.2141],

[5.2111],

[5.2120],

[5.2156],

[5.2176]])

When our targets are the following:

tensor([ 4.3582, 9.1221, 0.4407, 0.3569, 2.3914, 5.2743, 5.6834, 12.2206,

8.0923, 3.9258])]

I’ve seen signs that some of the hidden layers are getting saturated (+1/-1). Probably explains why all the outputs look the same.

I tried normalizing data, clipping gradients with no success

As mentioned in my original post, the same model in Tensorflow does not have the same issue.

Thanks for reading and if you have any suggestions or need more information let me know!