RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation (1.11.0+cu113):

When trying to train network for Meta Pseudo Labels, the following code:

x_i, x_j, x = train_iter
    x_i_t, x_j_t, x_t, target_t = test_iter
    """teacher forward pass"""
    _, _, _, _, _, _, _, t_out = t_model(x_i_t.to(device1), x_j_t.to(device1), x_t.to(device1))
    # labeled teacher loss
    t_loss_l = xent_criterion(t_out, target_t.to(device1))
    # soft pseudo-labels
    _, _, _, _, _, out_i, out_j, _ = t_model(x_i.to(device1), x_j.to(device1), x.to(device1))
    spl = torch.softmax(out_i.detach(), dim=-1)
    # hard pseudo-labels
    max_probs, hpl = torch.max(spl, dim=-1)
    # calculate mask 
    mask = max_probs.ge(0.5).float()
    t_loss_u = torch.mean(-(spl * torch.log_softmax(out_j, dim=-1)).sum(dim=-1) * mask)
    t_loss_uda = t_loss_l + t_loss_u

    """student optimizer step"""
    # these values are re-used by t-network downstream (fix) (compare to 't_logits_us')
    _, _, _, _, _, _, _, t_out_s = s_model(x_i_t.to(device0), x_j_t.to(device0), x_t.to(device0))
    _, _, _, _, _, out_i_s, out_j_s, _ = s_model(x_i.to(device0), x_j.to(device0), x.to(device0))
    s_loss = xent_criterion(out_i_s, hpl.to(device0))
    s_loss_l_old = xent_criterion(t_out_s.detach(), target_t.to(device0))
    s_optimizer.zero_grad()
    s_loss.backward()  # retain_graph=True, inputs=list(s_model.parameters()))
    s_optimizer.step()

    """student forward pass"""
    # TODO: determine relation between t_logits and t_loss_mpl
    _, _, _, _, _, _, _, t_out_s = s_model(x_i_t.to(device0), x_j_t.to(device0), x_t.to(device0))
    # _, _, _, _, _, out_i, out_j, _ = t_model(x_i.to(device1), x_j.to(device1), x.to(device1))
    s_loss_l_new = xent_criterion(t_out_s.detach(), target_t.to(device0))
    # dot_prod = s_loss_l_new.detach() - s_loss_l_old.detach()  # .detach()
    _, hpl = torch.max(out_j.detach(), dim=-1)
    t_loss_mpl = xent_criterion(out_j, hpl.to(device1))  #  dot_prod.to(device1) * 
    # t_loss_mpl_t = t_loss_mpl.clone()  # .to(device1)
    # t_loss_uda_t = t_loss_uda.clone()  # .to(device1)
    t_loss = t_loss_uda.to(device1) + t_loss_mpl.to(device1)

    """teacher optimizer step"""
    t_optimizer.zero_grad()
    t_loss.backward(retain_graph=True)  # , inputs=list(t_model.parameters()))
    t_optimizer.step()

    return t_loss.item(), s_loss.item() 

results in the following error (with set_detect_anomaly(True):

Gathering Pseudo Labels: 466/466Step [0/466]	 Loss: 4.100398063659668
/home/adam/.local/lib/python3.10/site-packages/torch/autograd/__init__.py:173: UserWarning: Error detected in AddmmBackward0. Traceback of forward call that caused the error:
  File "/home/adam/contrastive_learner/SimCLR/main.py", line 656, in <module>
    main(0, args)
  File "/home/adam/contrastive_learner/SimCLR/main.py", line 603, in main
    loss_epoch = train(args, train_loader, test_loader, t_model, model, xent_criterion, criterion, optimizer, t_optimizer, save_path)  # pseudo_loader
  File "/home/adam/contrastive_learner/SimCLR/main.py", line 183, in train
    t_loss_mpl, s_loss_mpl = MPL((x_i, x_j, x), (x_i_t, x_j_t, x_t, target_t), t_model, model, t_optimizer, optimizer, xent_criterion)
  File "/home/adam/contrastive_learner/SimCLR/main.py", line 93, in MPL
    _, _, _, _, _, out_i, out_j, _ = t_model(x_i.to(device1), x_j.to(device1), x.to(device1))
  File "/home/adam/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/adam/contrastive_learner/SimCLR/vgg19_64.py", line 129, in forward
    out_x_j = self.classifier(z_j)
  File "/home/adam/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/adam/.local/lib/python3.10/site-packages/torch/nn/modules/container.py", line 141, in forward
    input = module(input)
  File "/home/adam/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/adam/.local/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 103, in forward
    return F.linear(input, self.weight, self.bias)
 (Triggered internally at  ../torch/csrc/autograd/python_anomaly_mode.cpp:104.)
  Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
Traceback (most recent call last):
  File "/home/adam/contrastive_learner/SimCLR/main.py", line 656, in <module>
    main(0, args)
  File "/home/adam/contrastive_learner/SimCLR/main.py", line 603, in main
    loss_epoch = train(args, train_loader, test_loader, t_model, model, xent_criterion, criterion, optimizer, t_optimizer, save_path)  # pseudo_loader
  File "/home/adam/contrastive_learner/SimCLR/main.py", line 183, in train
    t_loss_mpl, s_loss_mpl = MPL((x_i, x_j, x), (x_i_t, x_j_t, x_t, target_t), t_model, model, t_optimizer, optimizer, xent_criterion)
  File "/home/adam/contrastive_learner/SimCLR/main.py", line 128, in MPL
    t_loss.backward(retain_graph=True)  # , inputs=list(t_model.parameters()))
  File "/home/adam/.local/lib/python3.10/site-packages/torch/_tensor.py", line 363, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/adam/.local/lib/python3.10/site-packages/torch/autograd/__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [128, 4]], which is output 0 of AsStridedBackward0, is at version 3; expected version 2 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

I believe that this error is a result of the s_model being updated with respect to the t_model variables, however I am unsure exactly which part of the code is causing this error. Any help is greatly appreciated.

Often using retain_graph=True will yield such an error and is usually used as a workaround for another issue:

t_loss.backward(retain_graph=True)

Could you explain why you are using this argument and why it’s needed, please?

I get the error even when i remove this argument.

It may be worthwhile to note that the error only happens on the second iteration, after completing one iteration successfully.

Could you add some missing pieces to your code to create a minimal, executable code snippet, please?
This would allow us to debug it further, as I cannot see any obvious issues.

def train(args, train_loader, test_loader, t_model, model, xent_criterion, criterion, t_optimizer, optimizer, save_path):  

    loss_epoch = 0
    model.train()
    t_model.train()
    test_iter = iter(test_loader)
    for step, ((x_i, x_j, x, _, target)) in enumerate(train_loader):

        try:
            test_data = test_iter.next()
        except:
            test_iter = iter(test_loader)
            test_data = test_iter.next()

        if step == 0:
            save_image(x_i[:100], os.path.join(save_path, 'input_images.png'), nrow=10)
        optimizer.zero_grad()

        x_i_t, x_j_t, x_t, idx_t, target_t = test_data

        # balanced/labeled dataset 
        # x_i_t = x_i_t.cuda()
        # x_j_t = x_j_t.cuda()
        # x_t = x_t.cuda()
        # target_t = target_t.cuda()

        # unlabeled dataset 
        # x_i = x_i.cuda(non_blocking=True)
        # x_j = x_j.cuda(non_blocking=True)
        # x = x.cuda(non_blocking=True)

        """perform MPL step"""
        t_loss_mpl, s_loss_mpl = MPL((x_i, x_j, x), (x_i_t, x_j_t, x_t, target_t), t_model, model, t_optimizer, optimizer, xent_criterion)

        # positive pair, with encoding
        h_i, h_j, z_i, z_j, _, _, _, _ = model(x_i.to(device0), x_j.to(device0), x.to(device0))

        loss = criterion(z_i, z_j)
        loss.backward()

        optimizer.step()

        if dist.is_available() and dist.is_initialized():
            loss = loss.data.clone()
            dist.all_reduce(loss.div_(dist.get_world_size()))

        if args.nr == 0 and step % 50 == 0:
            print(f"Step [{step}/{len(train_loader)}]\t Loss: {loss.item()}")

        if args.nr == 0:
            # writer.add_scalar("Loss/train_epoch", loss.item(), args.global_step)
            args.global_step += 1

        loss_epoch += loss.item()
    return loss_epoch

def MPL(train_iter, test_iter, t_model, s_model, t_optimizer, s_optimizer, xent_criterion):

    x_i, x_j, x = train_iter
    x_i_t, x_j_t, x_t, target_t = test_iter
    """teacher forward pass"""
    _, _, _, _, _, _, _, t_out = t_model(x_i_t.to(device1), x_j_t.to(device1), x_t.to(device1))
    # labeled teacher loss
    t_loss_l = xent_criterion(t_out, target_t.to(device1))
    # soft pseudo-labels
    _, _, _, _, _, out_i, out_j, _ = t_model(x_i.to(device1), x_j.to(device1), x.to(device1))
    spl = torch.softmax(out_i.detach(), dim=-1)
    # hard pseudo-labels
    max_probs, hpl = torch.max(spl, dim=-1)
    # calculate mask 
    mask = max_probs.ge(0.5).float()
    t_loss_u = torch.mean(-(spl * torch.log_softmax(out_j, dim=-1)).sum(dim=-1) * mask)
    t_loss_uda = t_loss_l + t_loss_u

    """student optimizer step"""
    # these values are re-used by t-network downstream (fix) (compare to 't_logits_us')
    _, _, _, _, _, _, _, t_out_s = s_model(x_i_t.to(device0), x_j_t.to(device0), x_t.to(device0))
    _, _, _, _, _, out_i_s, out_j_s, _ = s_model(x_i.to(device0), x_j.to(device0), x.to(device0))
    s_loss_l_old = xent_criterion(t_out_s.clone().detach(), target_t.to(device0))
    s_loss = xent_criterion(out_i_s, hpl.to(device0))
    # s_optimizer.zero_grad()
    s_loss.backward()  # retain_graph=True, inputs=list(s_model.parameters()))
    s_optimizer.step()

    """student forward pass"""
    _, _, _, _, _, _, _, t_out_s = s_model(x_i_t.to(device0), x_j_t.to(device0), x_t.to(device0))
    # _, _, _, _, _, out_i, out_j, _ = t_model(x_i.to(device1), x_j.to(device1), x.to(device1))
    s_loss_l_new = xent_criterion(t_out_s.clone().detach(), target_t.to(device0))
    dot_prod = s_loss_l_new - s_loss_l_old  # .detach()
    _, hpl = torch.max(out_j.clone().detach(), dim=-1)
    t_loss_mpl = dot_prod.to(device1) * xent_criterion(out_j, hpl.to(device1))  #  dot_prod.to(device1) * 
    # t_loss_mpl_t = t_loss_mpl.clone()  # .to(device1)
    # t_loss_uda_t = t_loss_uda.clone()  # .to(device1)
    t_loss = t_loss_uda.to(device1) + t_loss_mpl.to(device1)

    """teacher optimizer step"""
    # t_optimizer.zero_grad()
    pdb.set_trace()
    t_loss.backward() 
    t_optimizer.step()

    t_model.zero_grad()
    s_model.zero_grad()

    return  t_loss.item(), s_loss.item() 

please let me know if you need more information

for reference, I have adapted this code from the following repository:

Thanks for the update! Could you post the tensor shapes etc. so that I could use random tensors for the execution?

here is the network im using with tensor input sizes [batch_size, 1, 64, 64]

import torch
import torch.nn as nn
import torch.utils.model_zoo as model_zoo
import pdb
import math


class Vgg19(nn.Module):

    def __init__(self, num_classes, projection_dim, init_weights=True):
        super(Vgg19, self).__init__()
        self.features = nn.Sequential(
            # 1
            nn.Conv2d(1, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            # 2
            nn.Conv2d(64, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 64 --> 32
            # 3
            nn.Conv2d(64, 128, kernel_size=3, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True),
            # 4
            nn.Conv2d(128, 128, kernel_size=3, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 32 --> 16
            # 5
            nn.Conv2d(128, 256, kernel_size=3, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(inplace=True),
            # 6
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(inplace=True),
            # 7
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(inplace=True),
            # 8
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(inplace=True),
            # 9
            nn.Conv2d(256, 512, kernel_size=3, padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 16 --> 8
            # 10
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
            # 11
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
            # 12
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 8 --> 4
            # 13
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
            # 14
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
            # 15
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
            # 16
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 4 --> 2
            
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 4 --> 2
        )

        self.projection = nn.Sequential(
            nn.Linear(512, 512),
            nn.ReLU(inplace=True),
            nn.Linear(512, projection_dim)
        )

        self.classifier = nn.Sequential(
            # 17
            nn.Linear(projection_dim, 128),
            nn.ReLU(inplace=True),
            nn.Dropout(),
            # 18
            nn.Linear(128, 128),
            nn.ReLU(inplace=True),
            nn.Dropout(),
            # 19
            nn.Linear(128, num_classes)
        )

        # self.linear = nn.Sequential(
        #     nn.Linear(num_classes, 2),
        #     )
        if init_weights:
            self._initialize_weights()

    def forward(self, x_i, x_j, x):
        x = self.features(x)
        h_i = self.features(x_i)
        h_j = self.features(x_j)
        #~ print(x.size())
        h_x = x.view(x.size()[0], -1)
        h_i = h_i.view(h_i.size()[0], -1)
        h_j = h_j.view(h_j.size()[0], -1)

        z_x = self.projection(h_x)
        z_i = self.projection(h_i)
        z_j = self.projection(h_j)

        class_out = self.classifier(z_x)
        out_x_i = self.classifier(z_i)
        out_x_j = self.classifier(z_j)
        # y = self.linear(x)
        return h_i, h_j, z_i, z_j, z_x, out_x_i, out_x_j, class_out

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
                if m.bias is not None:
                    m.bias.data.zero_()
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()
            elif isinstance(m, nn.Linear):
                m.weight.data.normal_(0, 0.01)
                m.bias.data.zero_()


def vgg19(**kwargs):
    model = Vgg19(**kwargs)
    return model

Hi, could you please try removing all inplace=True arguments from the relu and other layers?
Please post if the error still occurs after that.

Hello, the error still occurs after setting all in place operations to False.

Hi, just wondering if you can see any other issue with this code

No, cannot spot an inplace error just by looking.

Also, did you try enabling anomaly detection?
Check this, too.

I previously provided above a code snippet, model, and input tensor size to @ptrblck above. Is there more information you may need to run the code? The original error I posted contains the traceback using anomaly detection. In the meantime I will look at the link you provided.

Your code is unfortunately still not executable and it’s unclear what exactly is causing the error.
I can properly initialize the model and use the posted input shape to train the model without getting the mentioned error:

model = Vgg19(10, 10)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)


for epoch in range(10):
    optimizer.zero_grad()
    x = torch.randn(2, 1, 64, 64)
    out = model(x, x, x)

    losses = [F.mse_loss(o, torch.randn_like(o)) for o in out]
    loss = torch.stack(losses).mean()
    loss.backward()
    optimizer.step()
    print("epoch {}, loss {:.3f}".format(epoch, loss.item()))

@srishti-git1110 @ptrblck Thank you for all your replies, I will keep working on it and post if I am able to find some solution.

Sounds good! Fell free to ping me again here once you were able to create an executable code snippet, which would then reproduce the issue and I could try to debug it.

Do you have anything that can guide me in how to make the snippet? I am unsure exactly how to do this, thanks.