RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 1, 256, 256]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead

Hi,

Frankly speaking, I am a newbie to Pycharm and familiar with Tensorflow. While reproducing the code available at Here using Pycharm as the IDE, I am facing the following error. Can you please help me to resolve it? I would really appreciate your help. Thanks in advance.

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 1, 256, 256]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

I have just edited the dataset file (to load the dataset in an unsupervised way from the directories) and created a new train.py file to run the code in Pycharm. All of the remaining code is exactly the same as in the mentioned repository.

The dataset.py file is edited to this:

import os
import glob
import torch
import random
import torch.utils.data as data
from PIL import Image
import torchvision.transforms as transforms


class Images_with_Names(data.Dataset):
    """ can act both as Supervised or Un-supervised """

    def __init__(self, directory_A, directory_B, unsupervised=True, transform=None):
        self.directory_A = directory_A
        self.directory_B = directory_B
        self.unsupervised = unsupervised
        self.transform = transform

        self.imageList_A = sorted(glob.glob(f"{directory_A}/*.jpg*"))
        self.imageList_B = sorted(glob.glob(f"{directory_B}/*.jpg*"))

    def __getitem__(self, index):
        image_A = Image.open(self.imageList_A[index])
        if self.unsupervised:
            image_B = Image.open(self.imageList_B[random.randint(0, len(self.imageList_B) - 1)])
        else:
            image_B = Image.open(self.imageList_B[index])

        if self.transform is not None:
            image_A = self.transform(image_A)
            image_B = self.transform(image_B)

        return image_A, image_B

    def __len__(self):
        return max(len(self.imageList_A), len(self.imageList_B))

def preprocessing(x):
    x = (x / 127.5) - 1
    x = torch.reshape(x, (-1, x.shape[0], x.shape[1], x.shape[2]))
    return x

The train.py file is:

import os
import torch
import torchvision.transforms as transforms
from torchsummary import summary

from utils import train_UGAC
from dataset import Images_with_Names
from dataset import preprocessing
from Networks import CasUNet_3head, NLayerDiscriminator


# First instantiate the generators and discriminators
netG_A = CasUNet_3head(3, 3)
netD_A = NLayerDiscriminator(3, n_layers=4)
netG_B = CasUNet_3head(3, 3)
netD_B = NLayerDiscriminator(3, n_layers=4)

data_directory = "../code/UncertaintyAwareCycleConsistency/data/"
directory_A = os.path.join(data_directory, "A")
directory_B = os.path.join(data_directory, "B")

data_transformer = transforms.Compose([transforms.PILToTensor(),
                                       transforms.Lambda(lambda x: preprocessing(x))])

train_loader = Images_with_Names(directory_A=directory_A, directory_B=directory_B, unsupervised=True,
                                 transform=data_transformer)

# summary(netG_A.cuda(), input_size=(3, 256, 256))
train_UGAC(netG_A, netG_B, netD_A, netD_B, train_loader, dtype=torch.cuda.FloatTensor, device='cuda',
           num_epochs=10, init_lr=1e-5, ckpt_path='..saved_models/checkpoints/UGAC',
           list_of_hp=[1, 0.015, 0.01, 0.001, 1, 0.015, 0.01, 0.001, 0.05, 0.05, 0.01])

Attempts that I have tried to resolve the issue are:

  1. Setting inplace=False to all Relu and LeakyReluactivations following this but failed.
  2. Tried to get traceback of forward call that caused the error with torch.autograd.set_detect_anomaly(True), it says the following:
[W python_anomaly_mode.cpp:104] Warning: Error detected in ReluBackward0. Traceback of forward call that caused the error:

File "/home/xyz/code/UncertaintyAwareCycleConsistency/src/train.py", line 29, in <module>
    netG_A, netG_B, netD_A, netD_B = train_UGAC(netG_A, netG_B, netD_A, netD_B, train_loader, dtype=torch.cuda.FloatTensor,
  File "/home/xyz/code/UncertaintyAwareCycleConsistency/src/utils.py", line 69, in train_UGAC
    t0, t0_alpha, t0_beta = netG_B(xA)
  File "/home/xyz/.conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/xyz/code/UncertaintyAwareCycleConsistency/src/Networks.py", line 205, in forward
    y = self.unet_list[i](y + x)
  File "/home/xyz/.conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/xyz/code/UncertaintyAwareCycleConsistency/src/Networks.py", line 181, in forward
    y_mean, y_alpha, y_beta = self.out_mean(x), self.out_alpha(x), self.out_beta(x)

Looking forward to hearing from you soon. Thanks.

@ptrblck

Itā€™s hard to tell where the error is coming from without seeing the model definition. Check the forward implementation of your model(s) and remove all inplace operations (e.g. tensor += a) and replace them with their out-of-place versions.

2 Likes

@zeeshannisar Have you resolved it? I run into this problem too. Setting inplace=False to all Relu
failed.

1 Like

dawg. Iā€™m throwing together a bare-bones implementation of perceiverIO and your suggestion got my training loop to work. Thanks man

I also had a similar error and my problem was having two optimizers updating different subsections of my model. Many optimizers keep track of previous passes to change how the weights are modified. My first optimizer would step the weights leaving a a different version in those weights. When my second optimizer came around and tried update those weights with other weights that had not been trained yet, it threw the error.

I found the solution was to add up the loss from both of the different ways I wanted to train the model, and then call backward at once for both. Then I put the entire model on one optimizer and called step right after that backward call. This fixed the problem.

For me this error was really misleading, but I was doing something pretty weird haha. Hopefully this helps someone

Thank you for your sharing. I have also encountered this problem recently when doing similar work. If it is convenient, can you share the code? Thank you very much.