In Deep-Q-Networks we compute the loss as:
loss = Q(s,a) - (r + gamma * maxQ'(s', a'))
So in order to calculate values of Q(s,a) and maxQ’(s’,a’), we need to do to a forward pass on the model two times. If I understood correctly this would create two different computation graphs for each forward pass.
So my question is whether I “have to” detach() the resulting value of maxQ’(s’,a’) before doing the backward pass. Does it lead to errors if I don’t, and why?
You can write your code in this way to avoid the unwanted grad calculation:
Q(s, a) = your_model(s, a)
maxQ'(s',a') = your_model(s', a')
Then the forward calculation of maxQ will not accumulate grads on parameters of your model.
However, my question is mostly about if I have to do that, and whether it would produce errors if I don’t, and why. (in a situation where we have x and y, both resulting from forward passes and both have gradients and now we want to backprop loss(x, y))
Any help/pointers/references on this topic is still appreciated.
You are welcome, yes you have to do that either by using ‘with torch.no_grad()’ or ‘.detach()’, otherwise your gradient is not accumulated correctly with respect to the loss.