You are accumulating the loss
via loss += self.entropyloss(pred_targets, targets)
in combination with retain_graph=True
, which will keep all computation graphs alive. Changing the parameters inplace via self.optim.step()
would create stale forward activations as described in this post.
Could you explain your use case a bit more and especially why you are retaining the graph?