RNN predicting a constant output

abn · March 20, 2019, 12:37pm

Hi, PyTorch community!

I am new to PyTorch and deep learning in general. I am using Elman RNN (Ref) in a regression analysis problem. However, the RNN is always predicting a constant output. I have tried -

Changing batch size
Scaling the input and output by a constant factor

but still, the issue persists.

class RNN(nn.Module):
    def __init__(self, input_size, output_size, hidden_dim, n_layers):
        super(RNN, self).__init__()
        
        self.hidden_dim=hidden_dim
        self.rnn = nn.RNN(input_size, hidden_dim, n_layers, batch_first=True)
        self.fc = nn.Linear(hidden_dim, output_size)

    def forward(self, x, hidden):
        # x (batch_size, seq_length, input_size)
        # hidden (n_layers, batch_size, hidden_dim)
        # r_out (batch_size, time_step, hidden_size)
        batch_size = x.size(0)
    
        r_out, hidden = self.rnn(x, hidden)
        output = self.fc(r_out)
        return output, hidden

# hyperparameters
input_size=3
output_size=3
hidden_dim=14
n_layers=1
lr=0.1

# instantiate an RNN
rnn = RNN(input_size, output_size, hidden_dim, n_layers)
print(rnn)

# MSE loss and Adam optimizer with a learning rate of 0.01
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(rnn.parameters(), lr=lr)

final_hidden = None
# train the RNN
def train(rnn, n_steps, print_every):
    
    hidden = None      
    
    for batch_i, step in enumerate(range(n_steps)):
        prediction, hidden = rnn(x_tensor, hidden)
        final_hidden = hidden

        ## Representing Memory ##
        # make a new variable for hidden and detach the hidden state from its history
        # this way, we don't backpropagate through the entire history
        hidden = hidden.data

        loss = criterion(prediction, y_tensor)
        loss_val.append(loss.item())

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if batch_i%print_every == 0 or batch_i == n_steps-1:
            print("Epoch: {0}/{1}".format(batch_i+1, n_steps))
            print('Loss: ', loss.item())
            print(prediction.data)
    
    return rnn

Thanks for any advice in advance!

ptrblck · March 20, 2019, 4:18pm

Could you print the shapes of prediction and y_tensor before passing them to your criterion?
Your model seems to train well using random inputs and targets.

abn · March 20, 2019, 5:20pm

@ptrblck , I am getting

prediction shape torch.Size([25, 20, 3])
y_tensor shape torch.Size([25, 20, 3])

ptrblck · March 20, 2019, 8:38pm

Thanks for the information!
In that case, could you try to lower the learning rate and see if the outputs stays constant?
If that doesn’t help, you should try to overfit a small data sample (e.g. just 10 samples) and see, if your model is able to learn this sample at all.

abn · March 21, 2019, 1:59pm

@ptrblck Thanks for replying!
It is still giving constant output. I tried lr=0.001 and 20000 epochs with the following x_tensor

[[0.5 3.  9.5]
 [0.6 3.  9.5]
 [0.7 3.  9.5]
 [0.8 3.  9.5]
 [0.5 3.5 9.5]
 [0.6 3.5 9.5]
 [0.7 3.5 9.5]
 [0.8 3.5 9.5]
 [0.5 4.  9.5]
 [0.6 4.  9.5]]

and the following y_tensor

[[1.18310e+02 1.00000e-01 5.35000e-01]
 [1.18290e+02 7.00000e-02 5.85000e-01]
 [1.18253e+02 3.00000e-02 6.35000e-01]
 [1.18148e+02 3.00000e-02 6.35000e-01]
 [1.18228e+02 3.00000e-02 5.60000e-01]
 [1.18148e+02 3.00000e-02 5.78000e-01]
 [1.18083e+02 8.00000e-02 6.68000e-01]
 [1.18060e+02 7.00000e-02 6.88000e-01]
 [1.18033e+02 9.00000e-02 5.58000e-01]
 [1.18048e+02 6.00000e-02 6.38000e-01]]

and final predicted output was:

tensor([[[1.1816e+02, 5.9165e-02, 6.0802e-01],
         [1.1816e+02, 5.9165e-02, 6.0802e-01],
         [1.1816e+02, 5.9165e-02, 6.0802e-01],
         [1.1816e+02, 5.9165e-02, 6.0802e-01],
         [1.1816e+02, 5.9165e-02, 6.0802e-01],
         [1.1816e+02, 5.9165e-02, 6.0802e-01],
         [1.1816e+02, 5.9165e-02, 6.0802e-01],
         [1.1816e+02, 5.9165e-02, 6.0802e-01],
         [1.1816e+02, 5.9165e-02, 6.0802e-01],
         [1.1816e+02, 5.9165e-02, 6.0802e-01]]])

ptrblck · March 23, 2019, 11:01am

I think the value ranges might be a problem in your use case.
Since your model trains fine using random (normalized) data, I would suggest to normalize the input data and run the training again.
Also, if that doesn’t help, you could normalize the output additionally and de-normalize it for prediction.
The actual target signal seems to be a small signal on top of a mean value.

abn · March 23, 2019, 12:10pm

Thanks! Will try the same.

robd2 · March 23, 2019, 3:59pm

Yes these true number (vs classification) prediction models can be tricky. Since your new to Pytorch I would further reduce the complexity of this data set and try a univariate input/output test vs the multivariate you currently have. Try to get the model to memorize a sign wave shape. Feel free to post more code for us to review.