RuntimeError: Checkpointing is not compatible with .grad(), please use .backward() if possible

Hello everyone,

I ran into a “RuntimeError: Checkpointing is not compatible with .grad(), please use .backward() if possible” in the following codes:

W = model.layers[0].conv.mlp[-1].weight
train_loss = [loss1, loss2]

for i, t in enumerate(tasks):
    gygw = torch.autograd.grad(train_loss[i], W, retain_graph=True)
    norms.append(torch.norm(torch.mul(Weights.weights[i], gygw[0])))

Was trying to calculate the gradient of W w r t loss as in pytorch-grad-norm/train.py at master · brianlan/pytorch-grad-norm · GitHub

Thank you very much for any help! Thanks!

My model utilized gradient checkpointing in some layers. Close this topic…