Hi,
This is most likely happening because value_optimizer.step()
actually modifies the weights of the model inplace while the original value of these weights is needed to compute action_loss.backward()
.
Is that the issue?
Hi,
This is most likely happening because value_optimizer.step()
actually modifies the weights of the model inplace while the original value of these weights is needed to compute action_loss.backward()
.
Is that the issue?