I am writing a model in PyTorch trained by exploiting the REINFORCE algorithm. The training procedure of the model presents two loss functions: the first loss is the REINFORCE update rule, whereas the second one allows to train the baseline, that is the neural network which predicts the expected reward given the hidden state at each time step. The model is trained by minimizing these two loss functions through two different instances of the SGD optimizer. The problem I am encountering is that, for each batch, after having performed the optimization step for the baseline, the model returns the following error when it tries to perform the optimization step for the REINFORCE update rule:
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
I suppose specifying retain_graph=True is not a solution here, since I want to train the baseline independently from the model. Could you provide me with some suggestions? Thank you!