How to fix "RuntimeError: Trying to backward through the graph a second time" in SHOT code?

Hi, I am trying to run SHOT code on my dataset available at https://github.com/tim-learn/SHOT/blob/master/digit/uda_digit.py#L260. However, I am getting the error “RuntimeError: Trying to backward through the graph a second time.” I have checked the code multiple times, but I do not understand which lines are making the gradient computed twice.???

    image, _, idx = get_batch(batch)
    ### get the pseudo-labels
    if self.epoch % self.interval_iter == 0:
        model.eval()
        self.mem_label = self.obtain_labels()
        model.train()
    pred = self.mem_label[idx]
    logit, _ = model(image)
    softmax_out = F.softmax(logit, dim=1)
    entropy_loss = torch.mean(Entropy(softmax_out))
    msoftmax = softmax_out.mean(dim=0)
    entropy_loss -= torch.sum(-msoftmax * torch.log(msoftmax + 1e-5))
    loss = 0.1 * F.cross_entropy(logit, pred)
    loss += entropy_loss

Hey, please show the code where you’re making .backward calls on (loss) tensor.

This error precisely occurs when you try to make a second .backward call on a tensor whose graph has already lost references to the saved tensors required for grading computation (after the first .backward call) owing the autograd’s aggressive memory freeing mechanism.

Enabling anomaly mode can be helpful here Automatic differentiation package - torch.autograd — PyTorch 2.0 documentation
See a similar thread here:
Backward twice without retain_graph=true where I shouldn't

1 Like