Backward runtime error with network using the hidden state of an lstm cell while training

8Gitbrix · August 15, 2017, 7:00pm

Hello, I’m trying to implement Deep Q learning with an lstm cell. I’m implementing something similar to this paper: https://arxiv.org/pdf/1509.03044.pdf. The lstm cell + linear layer network learns the hidden state from rewards, which is then passed into a separate DQN network for predicting the Q value. But I’m having trouble training the networks because of a backward runtime error.

Here is a snippet of my code:

# train RNN state model:
predict_reward, hidden_state = self.rnn_model(Variable(torch.from_numpy(state_batch).float()))
reward_var = Variable(torch.from_numpy(reward_batch).float(), requires_grad=False)
state_loss = nn.MSELoss()
state_loss = state_loss(predict_reward, reward_var)
self.rnn_optimizer.zero_grad()
state_loss.backward()
self.rnn_optimizer.step()

# generate target q values
target_q_output = self._generate_target_q_values(next_state_batch, reward_batch)
# get q net output after passing hidden_state into it
q_output =  self.qnet(hidden_state)
q_output = q_output[range(self._mini_batch_size), action_indexs]
loss = F.smooth_l1_loss(q_output, target_q_output)
self.optimizer.zero_grad()
loss.backward()
self.optimizer.step()

I get an error on the second last line (loss.backward()):

line 370, in train_minibatch loss.backward()
File “/usr/local/lib/python3.6/site-packages/torch/autograd/variable.py”, line 156, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File “/usr/local/lib/python3.6/site-packages/torch/autograd/init.py”, line 98, in backward
variables, grad_variables, retain_graph)
File “/usr/local/lib/python3.6/site-packages/torch/autograd/function.py”, line 91, in apply
return self._forward_cls.backward(self, *args)
File “/usr/local/lib/python3.6/site-packages/torch/autograd/_functions/basic_ops.py”, line 52, in backward
a, b = ctx.saved_variables
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

I think the error is caused by passing the hidden state into the q network after calling backward on rnn_model (lstm network + linear layer). How can I fix this issue? Thanks!

hughperkins · August 16, 2017, 7:21am

I imagine you have parameters that are optimized by both rnn_optimizer and optimizer. I think you probably want to ensure that the parameters optimized by each are disjoint?

8Gitbrix · August 16, 2017, 2:51pm

Yeah. My RNN model uses RMS prop and my dqn model uses adamax.
Earlier I declared:

self.rnn_optimizer = optim.RMSprop(self.rnn_model.parameters())
self.optimizer = optim.Adamax(self.qnet.parameters())

8Gitbrix · August 16, 2017, 3:16pm

I noticed that if I don’t save the hidden state in my rnn model, I don’t have any backward runtime error when calling loss.backward on the dqn network. But I need to save the hidden states in order for the rnn model to be useful.

This is how I defined my rnn model:

class LSTMState(nn.Module):
    def __init__(self, input_size, mini_batch_size):
        """
        :param input_size: Number of features in state
        This network has 1 output: reward.
        The LSTM cell outputs its representation of the state, which will be passed into DQN.
        """
        super(LSTMState, self).__init__()
        # initialize hidden state
        self.h = Variable(torch.zeros(mini_batch_size, input_size))
        # initialize cell state
        self.c = Variable(torch.zeros(mini_batch_size, input_size))
        self.hidden_dim = input_size

        # hidden size is number of features in the hidden state / size of output
        # input size is the number of features in incoming input
        self.lstm = nn.LSTMCell(input_size=input_size, hidden_size=input_size)
        self.fc = nn.Linear(input_size, 1) # the linear layer output represents predicted reward
        init.xavier_uniform(self.fc.weight)
        # no need for an activation function, since magnitude can be big, and rewards can be negative

    def forward(self, x, train=False):
        if train: # if training, we don't want to remember the hidden states?
            # initialize hidden state
            self.h = Variable(torch.zeros(x.size(0), self.hidden_dim))
            # initialize cell state
            self.c = Variable(torch.zeros(x.size(0), self.hidden_dim))
            h, c = self.lstm(x, (self.h, self.c))
            out = self.fc(self.h)
            return out, h
        else: # when testing, save the hidden and cell state
            self.h, self.c = self.lstm(x, (self.h, self.c))
            out = self.fc(self.h)
            # linear layer output out is the predicted reward, and self.h is the predicted state information
            return out, self.h

Never mind. The error arises if I save the hidden and cell states while training. But with this ^ definition it works, although it runs terribly.