RNN: Loss not reducing for some random seeds when prediction is clamped to min. of zero

Hi, I am using a simple RNN/GRU for my problem.

My inputs are continuous and on a scale of 0-1000 and variables to be predicted are continuous as well and on a scale of 0-800.

I’ve observed that with different random seeds, set using: torch.manual_seed(seed), I get very different training performance, all other things kept constant (learning rate, num_layers, hidden_size, bidirectional). For some seeds, I don’t see the loss changing at all. While for other seeds, the loss reducing with iterations as expected.

The prediction I would get would be all 0s. Now, it should be noted that I have the following two lines in the forward, which one might not have in a vanilla RNN. These are dictated by the model.

# Prediction should be non-negative
pred = torch.clamp(pred, min=0.)
# Prediction should always be lesser than or equal to the input
pred = torch.min(pred, x)

I was wondering why we’d have the case for the loss not changing at all, and why would it be so sensitive to
random seed. I have a hunch that it’s related to the 0 clamping that I’m doing, but I don’t know how to avoid that!

Here’s the RNN block I’m using:

class CustomRNN(nn.Module):
    def __init__(self, cell_type, hidden_size, num_layers, bidirectional):
        super(CustomRNN, self).__init__()

        if bidirectional:
            self.num_directions = 2
            self.num_directions = 1
        if cell_type=="RNN":
            self.rnn = nn.RNN(input_size=1, hidden_size=hidden_size,
                   num_layers=num_layers, batch_first=True,
        elif cell_type=="GRU":
            self.rnn = nn.GRU(input_size=1, hidden_size=hidden_size,
                              num_layers=num_layers, batch_first=True,
            self.rnn = nn.LSTM(input_size=1, hidden_size=hidden_size,
                              num_layers=num_layers, batch_first=True,

        self.linear = nn.Linear(hidden_size*self.num_directions, 1 )

    def forward(self, x):
        pred, hidden = self.rnn(x, None)
        pred = self.linear(pred).view(pred.data.shape[0], -1, 1)
        pred = torch.clamp(pred, min=0.)
        pred = torch.min(pred, x)
        return pred

Before training, I set the params to be non-negative.

for param in r.parameters():
    param.data = param.data.abs()

I get the following warning

UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greately increasing memory usage. To compact weights again call flatten_parameters().

However, now my problem seems to be solved. Happy to close this thread if someone can verify if the non-negative weight initialization is the right approach.