Hi, I am using a simple RNN/GRU for my problem.
My inputs are continuous and on a scale of 0-1000 and variables to be predicted are continuous as well and on a scale of 0-800.
I’ve observed that with different random seeds, set using: torch.manual_seed(seed)
, I get very different training performance, all other things kept constant (learning rate, num_layers, hidden_size, bidirectional). For some seeds, I don’t see the loss changing at all. While for other seeds, the loss reducing with iterations as expected.
The prediction I would get would be all 0s. Now, it should be noted that I have the following two lines in the forward
, which one might not have in a vanilla RNN. These are dictated by the model.
# Prediction should be non-negative
pred = torch.clamp(pred, min=0.)
# Prediction should always be lesser than or equal to the input
pred = torch.min(pred, x)
I was wondering why we’d have the case for the loss not changing at all, and why would it be so sensitive to
random seed. I have a hunch that it’s related to the 0 clamping that I’m doing, but I don’t know how to avoid that!
Here’s the RNN block I’m using:
class CustomRNN(nn.Module):
def __init__(self, cell_type, hidden_size, num_layers, bidirectional):
super(CustomRNN, self).__init__()
torch.manual_seed(1)
if bidirectional:
self.num_directions = 2
else:
self.num_directions = 1
if cell_type=="RNN":
self.rnn = nn.RNN(input_size=1, hidden_size=hidden_size,
num_layers=num_layers, batch_first=True,
bidirectional=bidirectional)
elif cell_type=="GRU":
self.rnn = nn.GRU(input_size=1, hidden_size=hidden_size,
num_layers=num_layers, batch_first=True,
bidirectional=bidirectional)
else:
self.rnn = nn.LSTM(input_size=1, hidden_size=hidden_size,
num_layers=num_layers, batch_first=True,
bidirectional=bidirectional)
self.linear = nn.Linear(hidden_size*self.num_directions, 1 )
def forward(self, x):
pred, hidden = self.rnn(x, None)
pred = self.linear(pred).view(pred.data.shape[0], -1, 1)
pred = torch.clamp(pred, min=0.)
pred = torch.min(pred, x)
return pred