What could cause this? It grows every backprop step and is severely slowing down my code
I found out the reason. In my model, I have this tensor:
self.log_alpha = torch.zeros(1, requires_grad=True)
Using it in my loss is fine:
alpha_loss = -(self.log_alpha * (log_pi + self.target_entropy).detach()).mean()
self.alpha_optimizer.zero_grad()
alpha_loss.backward()
But calling a view operation in my training loop, even with detach(), causes the ViewBackward recursion. Why?
def _do_training(self):
# self.log_alpha = self.log_alpha.view(1).detach()
Hi,
The graph above seems to be referencing the layer_norms
layer, the first module in it, and a parameter called bias
in that module. Are you sure it is this log_alpha
?
The recursion was occurring in multiple scenarios. In the layer_norms example, I was passing the final probability distribution in an attention module outside of the training loop. But the log_alpha was another concrete example of how to cause the behavior.