Moving tensors around without violating the gradient computation restrictions

Hello,

I am getting the following error:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

What I am currently doing is the following: I need to calculate the forward passes of my network in advance (that is, all of them before doing any backward pass). I want to store these tensors in memory, keeping their gradient graphs intact, so that when I am doing my training phase, I can rely on them.

A bit of more detail. As I perform the forward passes, I am storing the outputs of the networks in a dictionary which at every key contains a bag / group of outputs from my network:

d[bag_i] = torch.cat((d[bag_i], outputs[i]))

After I am done with this, I do the following reshaping:

for i in d:
    d[i] = d[i].view(-1, 100)

For training, I am sampling elements from these bags, and I want to use the tensors here to compute the loss for each batch, without running the forward passes while I am doing it.

It looks like I might be performing an inplace operation somewhere, so I am wondering if this way of concatenating the tensors is legal to keep the graph properly.

It is totally possible to store you forward passes and calculate the gradients later, as long as you keep the graph of the outputs (so no .detach()).

I think the error with the inplace operation has another origin. Do you have any ReLUs with inplace=True? Maybe turn them off.

Thank you for the answer. But no, I don’t use neither detach, nor ReLU with the param inplace=True.

Hi,

I think the problem is that you are updating d[bag_i] before computing backpropagation so when autograd tries to do it, the values already are modified.
Note that a = a+b is considered inplace operation.

One possible solution is to clone d[bag_i]s using .clone() function and update cloned version. Other approach is defining new tensor for concatenation.

Please see these posts. [1], [2]

Bests

What is d here? A python dictionary?
If it is, then this should be fine and is not considered an inplace operation.

A good way to track this is to enable anomaly mode: torch.autograd.set_detect_anomaly(True) at the beginning of your code to get a more helpful stack trace.
Can you report here what it points to and the code surrounding the faulty bit?

Hi,
I have enabled the anomaly mode and this is what I get in the stack trace:

File "/Users/evgeniya/PycharmProjects/svp/Main.py", line 34, in <module>
    do_EM()
  File "/Users/evgeniya/PycharmProjects/svp/Main.py", line 31, in do_EM
    EM(loader, get_device()).em(epochs=550)
  File "/Users/evgeniya/PycharmProjects/svp/EM.py", line 163, in em
    b, d = self.e(model)
  File "/Users/evgeniya/PycharmProjects/svp/EM.py", line 39, in e
    idx, d, n, cr = self.eval_images(model)
  File "/Users/evgeniya/PycharmProjects/svp/EM.py", line 152, in eval_images
    d = torch.cat((d, model(data.view(-1, 1, 64, 64).float()).view(data.size()[0], 100)))
  File "/Users/evgeniya/PycharmProjects/svp/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/evgeniya/PycharmProjects/svp/Net.py", line 116, in forward
    x = self.layer12(x)
  File "/Users/evgeniya/PycharmProjects/svp/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/evgeniya/PycharmProjects/svp/venv/lib/python3.6/site-packages/torch/nn/modules/container.py", line 100, in forward
    input = module(input)
  File "/Users/evgeniya/PycharmProjects/svp/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/evgeniya/PycharmProjects/svp/venv/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 106, in forward
    exponential_average_factor, self.eps)
  File "/Users/evgeniya/PycharmProjects/svp/venv/lib/python3.6/site-packages/torch/nn/functional.py", line 1923, in batch_norm
    training, momentum, eps, torch.backends.cudnn.enabled

Should I assume, that the torch.cat operation is responsible for the error or it could be caused by something before?

    d = torch.cat((d, model(data.view(-1, 1, 64, 64).float()).view(data.size()[0], 100)))

I have checked the posts regarding this problem and I have found out, that in most cases torch.cat was not the problem.

The problem on this line is not the torch.cat but the model(data.view(-1, 1, 64, 64).float()) as you can see that the rest of the stack trace points inside the model forward and to the batchnorm.

So you want to double check that input and output of the batchnorm and make sure they are not updated inplace (by a ReLU(inplace=True) for example).

The layers of my network are of the form:

        self.layer11 = nn.Sequential(
            nn.Conv2d(in_channels=1024, out_channels=1024, kernel_size=1),
            nn.BatchNorm2d(1024, track_running_stats=False),
            nn.ReLU()
        )

with some max pooling layers.

My forward function looks like this:

    def forward(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        ...
        x = self.layern(x)
        return x

That seems to work fine for me:

import torch
from torch import nn

model = nn.Sequential(
    nn.Conv2d(in_channels=1024, out_channels=1024, kernel_size=1),
    nn.BatchNorm2d(1024, track_running_stats=False),
    nn.ReLU()
)

inp = torch.rand(1, 1024, 5, 5)

model(inp).sum().backward()

This is my learning part:

for bag in d_bags:
                    loss = - self.loss(d_bags[bag], b_bags[bag])
                    loss.backward(retain_graph=True)
                    opt.step()

It successfully performs the first iteration and then fails on loss.backward. Without opt.step like this:

for bag in d_bags:
                    loss = - self.loss(d_bags[bag], b_bags[bag])
                    loss.backward(retain_graph=True)

It does not fail.

Ho interesting.
Then the issue is with the fact that the opt.step() is modifying inplace the weights of the batchnorm (for the affine part).
And in the second iteration, the original weights are still needed to compute the backward hence the error you see.

You can move the step() outside of the for loop to accumulate all the gradients before doing the backward.
Or even better, accumulate the loss inside the for-loop and do a single backward call outside on the accumulated loss.

1 Like

I did the following and now it got me totally lost:
I moved the backward, step outside the loop as you said

loss = 0
for bag in d_bags:
      loss = loss - self.loss(d_bags[bag], b_bags[bag])
loss.backward()
opt.step()

It was giving the same error.
Then I tried to remove the batchnorm layer totally. With the anomaly flag off it is giving still the same error of inplaced operations:

RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time. (apply at ../torch/csrc/autograd/generated/Functions.cpp:3969)

Then I turned the anomaly flag on torch.autograd.set_detect_anomaly(True) and it gave a different error:

RuntimeError: Function 'MulBackward0' returned nan values in its 1th output.

With the anomaly flag off it is giving still the same error of inplaced operations:

The error you linked is not the same actually !
Try adding a retain_graph=True to see if it is better.

Then I turned the anomaly flag on torch.autograd.set_detect_anomaly(True) and it gave a different error:

This can happen for few reasons but most likely here because you multiply something infinite with 0. Do you get infinite values in the forward for the mul that is pointed to by anomaly mode?