RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [3, 1]], which is output 0 of TanhBackward, is at version 1; expected version 0 instead

This is taken from here GitHub - davda54/sam: SAM: Sharpness-Aware Minimization (PyTorch)
You are right, I fixed the first one but the second method is still giving an error

#---------------Defination of LLOSS------------
if method == 'lloss':
                base_optimizer = torch.optim.SGD   
                optim_module   = SAM(models['module'].parameters(),  base_optimizer, lr=LR, 
                    momentum=MOMENTUM, weight_decay=WDECAY)
                sched_module   = lr_scheduler.MultiStepLR(optim_module, milestones=MILESTONES)
                optimizers = {'backbone': optim_backbone, 'module': optim_module}
                
                schedulers = {'backbone': sched_backbone, 'module': sched_module} 
            
# -----------------SAM Optimizer -------------------
        
        criterion(models['backbone'](inputs)[0], labels)
        loss.backward(retain_graph=True)
        optimizers['backbone'].first_step(zero_grad=True)
        
        criterion(models['backbone'](inputs)[0], labels)
        optimizers['backbone'].second_step(zero_grad=True)

        # -----------------SAM Optimizer for LLOSS Method -------------------
        if method == 'lloss':
            #optimizers['module'].step()
            criterion(models['backbone'](inputs)[0], labels)
            loss.backward(retain_graph=True)
            optimizers['module'].first_step(zero_grad=True)
            
            criterion(models['backbone'](inputs)[0], labels)
            optimizers['module'].second_step(zero_grad=True)

ERROR

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512, 100]], which is output 0 of AsStridedBackward0, is at version 3; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Using retain_graph=True will not fix the issue as it would only keep the intermediate forward activations alive. The main issue might still be the same: the step() method updates parameters which would be needed for the next backward call.
In this case you could either recalculate the forward pass to create the forward activations using the already updated parameters or update the parameters after all gradients were computed.

thank you so much for your comments, can you share with me any related tutorials.

I don’t know if there is a good tutorial, but this code snippet shows why this approach is mathematically wrong:

# setup
model = nn.Sequential(
    nn.Linear(10, 10),
    nn.ReLU(),
    nn.Linear(10, 10)
)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
criterion = nn.MSELoss()

# forward pass
x = torch.randn(1, 10)
out = model(x)

# loss calclation
loss = criterion(out, torch.rand_like(out))

# gradient calculation using the intermediate forward activations from the 
# previous forward pass (a0) and the current parameter set (p0)
loss.backward(retain_graph=True)

# update parameters to new set p1
optimizer.step()

# gradient calculation using the stale activations (a0) and the new parameter
# set p1, which will not work as it's mathematically wrong
loss.backward()
# RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [10, 10]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).