I’ve written previously about how I was having trouble converting a TF/Keras model to Pytorch and getting the same results.
I didn’t get a response, so in order to simplify things I built this notebook:
I have a dataset of 10 randomly generated (150 timesteps * 12 features) inputs, and their corresponding float outputs (this is a regression).
RNN Model is defined as below
class RNN(nn.Module): def __init__(self, input_size = 12, hidden_size=48, num_layers=1, bidirectional=False): super().__init__() self.input_size = input_size self.hidden_size = hidden_size self.bidirectional,self.num_layers = bidirectional,num_layers if bidirectional: self.num_directions = 2 else: self.num_directions = 1 self.rnn = nn.GRU(input_size, hidden_size, bidirectional=self.bidirectional, batch_first=True, num_layers = num_layers) self.final_layers = nn.Sequential( nn.Linear(self.num_directions * hidden_size,10), nn.ReLU(), nn.Linear(10,1), ) def forward(self,input_seq): output, h_n = self.rnn(input_seq) output = output[:,-1,:] output = self.final_layers(output) return output
When trying to overfit my 10 datapoints, my training loss eventually gets stuck.
Model outputs are all very close to each other:
When our targets are the following:
tensor([ 4.3582, 9.1221, 0.4407, 0.3569, 2.3914, 5.2743, 5.6834, 12.2206,
I’ve seen signs that some of the hidden layers are getting saturated (+1/-1). Probably explains why all the outputs look the same.
I tried normalizing data, clipping gradients with no success
As mentioned in my original post, the same model in Tensorflow does not have the same issue.
Thanks for reading and if you have any suggestions or need more information let me know!