RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 1, 256, 256]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead

Hi,

Frankly speaking, I am a newbie to Pycharm and familiar with Tensorflow. While reproducing the code available at Here using Pycharm as the IDE, I am facing the following error. Can you please help me to resolve it? I would really appreciate your help. Thanks in advance.

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 1, 256, 256]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

I have just edited the dataset file (to load the dataset in an unsupervised way from the directories) and created a new train.py file to run the code in Pycharm. All of the remaining code is exactly the same as in the mentioned repository.

The dataset.py file is edited to this:

import os
import glob
import torch
import random
import torch.utils.data as data
from PIL import Image
import torchvision.transforms as transforms


class Images_with_Names(data.Dataset):
    """ can act both as Supervised or Un-supervised """

    def __init__(self, directory_A, directory_B, unsupervised=True, transform=None):
        self.directory_A = directory_A
        self.directory_B = directory_B
        self.unsupervised = unsupervised
        self.transform = transform

        self.imageList_A = sorted(glob.glob(f"{directory_A}/*.jpg*"))
        self.imageList_B = sorted(glob.glob(f"{directory_B}/*.jpg*"))

    def __getitem__(self, index):
        image_A = Image.open(self.imageList_A[index])
        if self.unsupervised:
            image_B = Image.open(self.imageList_B[random.randint(0, len(self.imageList_B) - 1)])
        else:
            image_B = Image.open(self.imageList_B[index])

        if self.transform is not None:
            image_A = self.transform(image_A)
            image_B = self.transform(image_B)

        return image_A, image_B

    def __len__(self):
        return max(len(self.imageList_A), len(self.imageList_B))

def preprocessing(x):
    x = (x / 127.5) - 1
    x = torch.reshape(x, (-1, x.shape[0], x.shape[1], x.shape[2]))
    return x

The train.py file is:

import os
import torch
import torchvision.transforms as transforms
from torchsummary import summary

from utils import train_UGAC
from dataset import Images_with_Names
from dataset import preprocessing
from Networks import CasUNet_3head, NLayerDiscriminator


# First instantiate the generators and discriminators
netG_A = CasUNet_3head(3, 3)
netD_A = NLayerDiscriminator(3, n_layers=4)
netG_B = CasUNet_3head(3, 3)
netD_B = NLayerDiscriminator(3, n_layers=4)

data_directory = "../code/UncertaintyAwareCycleConsistency/data/"
directory_A = os.path.join(data_directory, "A")
directory_B = os.path.join(data_directory, "B")

data_transformer = transforms.Compose([transforms.PILToTensor(),
                                       transforms.Lambda(lambda x: preprocessing(x))])

train_loader = Images_with_Names(directory_A=directory_A, directory_B=directory_B, unsupervised=True,
                                 transform=data_transformer)

# summary(netG_A.cuda(), input_size=(3, 256, 256))
train_UGAC(netG_A, netG_B, netD_A, netD_B, train_loader, dtype=torch.cuda.FloatTensor, device='cuda',
           num_epochs=10, init_lr=1e-5, ckpt_path='..saved_models/checkpoints/UGAC',
           list_of_hp=[1, 0.015, 0.01, 0.001, 1, 0.015, 0.01, 0.001, 0.05, 0.05, 0.01])

Attempts that I have tried to resolve the issue are:

  1. Setting inplace=False to all Relu and LeakyReluactivations following this but failed.
  2. Tried to get traceback of forward call that caused the error with torch.autograd.set_detect_anomaly(True), it says the following:
[W python_anomaly_mode.cpp:104] Warning: Error detected in ReluBackward0. Traceback of forward call that caused the error:

File "/home/xyz/code/UncertaintyAwareCycleConsistency/src/train.py", line 29, in <module>
    netG_A, netG_B, netD_A, netD_B = train_UGAC(netG_A, netG_B, netD_A, netD_B, train_loader, dtype=torch.cuda.FloatTensor,
  File "/home/xyz/code/UncertaintyAwareCycleConsistency/src/utils.py", line 69, in train_UGAC
    t0, t0_alpha, t0_beta = netG_B(xA)
  File "/home/xyz/.conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/xyz/code/UncertaintyAwareCycleConsistency/src/Networks.py", line 205, in forward
    y = self.unet_list[i](y + x)
  File "/home/xyz/.conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/xyz/code/UncertaintyAwareCycleConsistency/src/Networks.py", line 181, in forward
    y_mean, y_alpha, y_beta = self.out_mean(x), self.out_alpha(x), self.out_beta(x)

Looking forward to hearing from you soon. Thanks.

@ptrblck

It’s hard to tell where the error is coming from without seeing the model definition. Check the forward implementation of your model(s) and remove all inplace operations (e.g. tensor += a) and replace them with their out-of-place versions.