Getting RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation while optimizing linear set of models

Ashray_Aggarwal_14 · April 6, 2020, 4:51am

I have tried to change nn.ReLU(True) or nn.LeakyReLU(True) from ‘Ture’ to ‘False’, but it didn’t work.
I am using Pytorch Version 1.4.0
Code is somewhat similar to this :

model_1 = my_model()
model_2 = my_model()
model_3 = my_model()

criterion = Loss()
optimizer = torch.optim.Adam(list(model_1.parameters())+list(model_2.parameters())+list(model_3.parameters()), lr=learning_rate)

for epoch in range(num_epochs):

	out_1 = model_1(a)
	out_2 = model_2(b)
	out_3 = model_3(out_1+out_2)

	loss = criterion(out_3,truth)
	loss.backward()
	optimizer.step()

ptrblck · April 6, 2020, 6:28am

The error might point to a detached computation graph in your model, so could you please post the model definition here?

The inplace versions of the non-linearities shouldn’t create this error message.

Ashray_Aggarwal_14 · April 6, 2020, 6:43am

class NetBlock(nn.Module):
    def __init__(self, in_channels=88, out_channels=64, kernel_size=3, padding=1):
        super(NetBlock, self).__init__()
        self.layer = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, kernel_size, padding=padding),
            nn.Conv2d(out_channels, out_channels,kernel_size, padding=padding),
            nn.BatchNorm2d(out_channels),
            nn.LeakyReLU(negative_slope=0.2, inplace=True)
        )

    def forward(self, x):
        return self.layer(x)


class my_model(nn.Module):

    def __init__(self):
        super(my_model, self).__init__()
                
        self.alpha = nn.Parameter(torch.rand(1))
        self.beta = nn.Parameter(torch.rand(1))
        
        self.layer = nn.ModuleList()
        self.pred = nn.ModuleList()

        self.layer.append(NetBlock(2, 64, 1, 0))
        self.pred.append(Pred(64, 1))

        self.layer.append(NetBlock(81, 64, 1, 0))
        self.pred.append(Pred())

        self.layer.append(NetBlock(kernel_size=1, padding=0))
        self.pred.append(Pred())

        # layer3
        self.layer.append(NetBlock())
        self.pred.append(Pred())

        self.layer.append(NetBlock())
        self.pred.append(Pred())

        self.layer.append(NetBlock())
        self.pred.append(Pred())

        self.layer.append(NetBlock())
        self.pred.append(Pred())

        self.layer.append(NetBlock())
        self.pred.append(Pred())

        self.layer.append(NetBlock())
        self.pred.append(Pred())

        self.layer.append(NetBlock())
        self.pred.append(Pred())

        # layer10
        self.layer.append(NetBlock())
        self.pred.append(Pred())

    def forward(self, x):
        feature_map = []
        pred_map = []
        output = []

        feature_map.append(self.layer[0](normalize(x[0])))
        pred_map.append(self.pred[0](feature_map[0]))
        amp = self.alpha*x[0][:, 0, :, :]+(1-self.alpha)*x[0][:, 1, :, :]
        output.append(torch.unsqueeze(amp,1))

        for i in range(1, len(x)):
            img_shape = (x[i].shape[2], x[i].shape[3])
            feature_map.append(self.layer[i](torch.cat([normalize(x[i]),
                                            F.interpolate(feature_map[i-1], img_shape, mode='bilinear'),
                                            F.interpolate(pred_map[i-1], img_shape, mode='bilinear')], 1)))
            pred_map.append(self.pred[i](feature_map[i]))
            amp = self.beta*x[i][:, 0:4, :, :] + (1-self.beta)*x[i][:, 8:12, :, :]
            phase = pred_map[i][:,4:8,:,:]
            output.append(torch.cat([amp, phase],1))
        return output

Though I am able to train a single instance of this model correctly

Ashray_Aggarwal_14 · April 6, 2020, 6:49am

While setting torch.autograd.set_detect_anomaly(True) I get this -
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [256, 256]], which is output 0 of SliceBackward, is at version 72; expected version 71 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

ptrblck · April 6, 2020, 6:57am

Sorry, I confused this thread with another one so please disregard my previous post.
Could you post an executable code snippet, as currently e.g. Pred() is undefined?

Ashray_Aggarwal_14 · April 6, 2020, 7:18am

class NetBlock(nn.Module):
    def __init__(self, in_channels=88, out_channels=64, kernel_size=3, padding=1):
        super(NetBlock, self).__init__()
        self.layer = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, kernel_size, padding=padding),
            nn.Conv2d(out_channels, out_channels,kernel_size, padding=padding),
            nn.BatchNorm2d(out_channels),
            nn.LeakyReLU(negative_slope=0.2, inplace=True)
        )

    def forward(self, x):
        return self.layer(x)


class my_model(nn.Module):

    def __init__(self):
        super(my_model, self).__init__()
                
        self.alpha = nn.Parameter(torch.rand(1))
        self.beta = nn.Parameter(torch.rand(1))
        
        self.layer = nn.ModuleList()
        self.pred = nn.ModuleList()

        self.layer.append(NetBlock(2, 64, 1, 0))
        self.pred.append(Pred(64, 1))

        self.layer.append(NetBlock(81, 64, 1, 0))
        self.pred.append(Pred())

        self.layer.append(NetBlock(kernel_size=1, padding=0))
        self.pred.append(Pred())

        # layer3
        self.layer.append(NetBlock())
        self.pred.append(Pred())

        self.layer.append(NetBlock())
        self.pred.append(Pred())

        self.layer.append(NetBlock())
        self.pred.append(Pred())

        self.layer.append(NetBlock())
        self.pred.append(Pred())

        self.layer.append(NetBlock())
        self.pred.append(Pred())

        self.layer.append(NetBlock())
        self.pred.append(Pred())

        self.layer.append(NetBlock())
        self.pred.append(Pred())

        # layer10
        self.layer.append(NetBlock())
        self.pred.append(Pred())

    def forward(self, x):
        feature_map = []
        pred_map = []
        output = []

        feature_map.append(self.layer[0](normalize(x[0])))
        pred_map.append(self.pred[0](feature_map[0]))
        amp = self.alpha*x[0][:, 0, :, :]+(1-self.alpha)*x[0][:, 1, :, :]
        output.append(torch.unsqueeze(amp,1))

        for i in range(1, len(x)):
            img_shape = (x[i].shape[2], x[i].shape[3])
            feature_map.append(self.layer[i](torch.cat([normalize(x[i]),
                                            F.interpolate(feature_map[i-1], img_shape, mode='bilinear'),
                                            F.interpolate(pred_map[i-1], img_shape, mode='bilinear')], 1)))
            pred_map.append(self.pred[i](feature_map[i]))
            amp = self.beta*x[i][:, 0:4, :, :] + (1-self.beta)*x[i][:, 8:12, :, :]
            phase = pred_map[i][:,4:8,:,:]
            output.append(torch.cat([amp, phase],1))
        return output

class Pred(nn.Module):
    def __init__(self, in_channels=64, out_channels=8, kernel_size=1):
        super(Pred, self).__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, kernel_size)

    def forward(self, x):
        out = F.tanh(self.conv(x))
        return out


model_1 = my_model()
model_2 = my_model()
model_3 = my_model()

criterion = Loss()
optimizer = torch.optim.Adam(list(model_1.parameters())+list(model_2.parameters())+list(model_3.parameters()), lr=learning_rate)

for epoch in range(num_epochs):

	out_1 = model_1(a)
	out_2 = model_2(b)
	out_3 = model_3(out_1+out_2)
        optimizer.zero_grad()
        
	loss = criterion(out_3,truth)
        with autograd.detect_anomaly():
               loss.backward()
	optimizer.step()

I am getting this error in loss.backward()


    331                                         optimizer.zero_grad()
    332                                         with autograd.detect_anomaly():
--> 333                                                 total_loss.backward()
    334                                         optimizer.step()
    335 

/usr/local/lib/python3.6/dist-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
    193                 products. Defaults to ``False``.
    194         """
--> 195         torch.autograd.backward(self, gradient, retain_graph, create_graph)
    196 
    197     def register_hook(self, hook):

/usr/local/lib/python3.6/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
     97     Variable._execution_engine.run_backward(
     98         tensors, grad_tensors, retain_graph, create_graph,
---> 99         allow_unreachable=True)  # allow_unreachable flag
    100 
    101 

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [256, 256]], which is output 0 of SliceBackward, is at version 72; expected version 71 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Ashray_Aggarwal_14 · April 7, 2020, 10:09am

Any help here?
Posted the code above with the error message

ptrblck · April 7, 2020, 10:32pm

Your code is still not executable, which makes debugging hard.
After removing the undefined normalize calls and trying to setup random inputs based on the shape error messages, this code works partly:

a = torch.randn(1, 1, 2, 224, 224)
b = a.clone()
c = a.clone()
out_1 = model_1(a)
out_2 = model_2(b)
out_3 = model_3(out_1+out_2)

However, out_1 as well as out_2 are list objects, to summing them won’t work.
If I try to sum out_1[0] with out_2[0], the shape doesn’t match the expected input channels.

Could you the necessary shapes to execute the code, please?

Ashray_Aggarwal_14 · April 8, 2020, 3:16am

Required dimensions :

	for epoch in range(num_epochs):

		a0 = torch.randn(8,2,8,8)
		b0 = torch.randn(8,2,8,8)
		a = [a0]*11
		b = [b0]*11

		out_1 = model_1(a)  # Input is a list of size 11 (model parameter) each containing 8 batch size values of size (2,8,8)
		out_2 = model_2(b)  # Output is a lsit of size 11 (model parameter) each containing 8 batch size values of size (1,8,8)

		train = []
		for i in range(len(out_1)):
			train.append(torch.cat([out_1[i], out_2[i]], dim=1))

		out_3 = model_3(train)

		loss = criterion(out_3,truth)
		loss.backward()
		optimizer.step()

Hope it helps

Ashray_Aggarwal_14 · April 12, 2020, 2:00am

Hey any help here, stuck for a while now.

I guess the problem is how I combine inputs using for loop, that maybe the reason why computation graph is breaking.
out_1 and out_2 are both python List of Tensors each of size n.
Each tensor size is (8* 1* Variable* Variable)
I want to combine them into a single list of size n with tensors of size (8* 2* Variable* Variable)

Any possible way?

ptrblck · April 12, 2020, 6:32am

Your code is still not executable.
With the last code snippet I get this error in model_1(a):

RuntimeError: Given groups=1, weight of size [64, 81, 1, 1], expected input[8, 67, 8, 8] to have 81 channels, but got 67 channels instead