I am exploring a RL agent and want to keep gradients of previous iterations in the following way:
grad_current_iter = grad_from_loss_current_iter + grad_from_loss_iter_-1 + grad_from_loss_iter_-2 …
reset every N iter
Would it be just enough to deactivate optimizer.zero_grad():
loss = obtain_loss()
if self.iteration % 10:
If you do this, then all the gradients will accumulate. It may or may not be what you want.
From your description, it seems like you want a sliding window? Or is just reseting every N iterations enough?
I want it to reset every episode (and the episodes have always the same number of steps).
So if my episode has 100 steps.
The step 50 would have 50 gradients accumulated and step 101 would have just one gradient.
I guess it will work, thanks.
In that case yes, your example will work just fine !