What could cause this? It grows every backprop step and is severely slowing down my code
I found out the reason. In my model, I have this tensor:
self.log_alpha = torch.zeros(1, requires_grad=True)
Using it in my loss is fine:
alpha_loss = -(self.log_alpha * (log_pi + self.target_entropy).detach()).mean() self.alpha_optimizer.zero_grad() alpha_loss.backward()
But calling a view operation in my training loop, even with detach(), causes the ViewBackward recursion. Why?
def _do_training(self): # self.log_alpha = self.log_alpha.view(1).detach()
The graph above seems to be referencing the
layer_norms layer, the first module in it, and a parameter called
bias in that module. Are you sure it is this
The recursion was occurring in multiple scenarios. In the layer_norms example, I was passing the final probability distribution in an attention module outside of the training loop. But the log_alpha was another concrete example of how to cause the behavior.