Moving tensors around without violating the gradient computation restrictions

Zhenya_Brichkova · May 25, 2020, 7:27pm

Hello,

I am getting the following error:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

What I am currently doing is the following: I need to calculate the forward passes of my network in advance (that is, all of them before doing any backward pass). I want to store these tensors in memory, keeping their gradient graphs intact, so that when I am doing my training phase, I can rely on them.

A bit of more detail. As I perform the forward passes, I am storing the outputs of the networks in a dictionary which at every key contains a bag / group of outputs from my network:

d[bag_i] = torch.cat((d[bag_i], outputs[i]))

After I am done with this, I do the following reshaping:

for i in d:
    d[i] = d[i].view(-1, 100)

For training, I am sampling elements from these bags, and I want to use the tensors here to compute the loss for each batch, without running the forward passes while I am doing it.

It looks like I might be performing an inplace operation somewhere, so I am wondering if this way of concatenating the tensors is legal to keep the graph properly.

Caruso · May 25, 2020, 8:29pm

It is totally possible to store you forward passes and calculate the gradients later, as long as you keep the graph of the outputs (so no .detach()).

I think the error with the inplace operation has another origin. Do you have any ReLUs with inplace=True? Maybe turn them off.

Zhenya_Brichkova · May 25, 2020, 8:42pm

Thank you for the answer. But no, I don’t use neither detach, nor ReLU with the param inplace=True.

Nikronic · May 25, 2020, 11:33pm

Hi,

I think the problem is that you are updating d[bag_i] before computing backpropagation so when autograd tries to do it, the values already are modified.
Note that a = a+b is considered inplace operation.

One possible solution is to clone d[bag_i]s using .clone() function and update cloned version. Other approach is defining new tensor for concatenation.

Please see these posts. [1], [2]

Bests

albanD · May 26, 2020, 12:59am

What is d here? A python dictionary?
If it is, then this should be fine and is not considered an inplace operation.

A good way to track this is to enable anomaly mode: torch.autograd.set_detect_anomaly(True) at the beginning of your code to get a more helpful stack trace.
Can you report here what it points to and the code surrounding the faulty bit?

Zhenya_Brichkova · May 27, 2020, 2:49pm

Hi,
I have enabled the anomaly mode and this is what I get in the stack trace:

File "/Users/evgeniya/PycharmProjects/svp/Main.py", line 34, in <module>
    do_EM()
  File "/Users/evgeniya/PycharmProjects/svp/Main.py", line 31, in do_EM
    EM(loader, get_device()).em(epochs=550)
  File "/Users/evgeniya/PycharmProjects/svp/EM.py", line 163, in em
    b, d = self.e(model)
  File "/Users/evgeniya/PycharmProjects/svp/EM.py", line 39, in e
    idx, d, n, cr = self.eval_images(model)
  File "/Users/evgeniya/PycharmProjects/svp/EM.py", line 152, in eval_images
    d = torch.cat((d, model(data.view(-1, 1, 64, 64).float()).view(data.size()[0], 100)))
  File "/Users/evgeniya/PycharmProjects/svp/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/evgeniya/PycharmProjects/svp/Net.py", line 116, in forward
    x = self.layer12(x)
  File "/Users/evgeniya/PycharmProjects/svp/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/evgeniya/PycharmProjects/svp/venv/lib/python3.6/site-packages/torch/nn/modules/container.py", line 100, in forward
    input = module(input)
  File "/Users/evgeniya/PycharmProjects/svp/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/evgeniya/PycharmProjects/svp/venv/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 106, in forward
    exponential_average_factor, self.eps)
  File "/Users/evgeniya/PycharmProjects/svp/venv/lib/python3.6/site-packages/torch/nn/functional.py", line 1923, in batch_norm
    training, momentum, eps, torch.backends.cudnn.enabled

Should I assume, that the torch.cat operation is responsible for the error or it could be caused by something before?

    d = torch.cat((d, model(data.view(-1, 1, 64, 64).float()).view(data.size()[0], 100)))

I have checked the posts regarding this problem and I have found out, that in most cases torch.cat was not the problem.

albanD · May 27, 2020, 2:56pm

The problem on this line is not the torch.cat but the model(data.view(-1, 1, 64, 64).float()) as you can see that the rest of the stack trace points inside the model forward and to the batchnorm.

So you want to double check that input and output of the batchnorm and make sure they are not updated inplace (by a ReLU(inplace=True) for example).

Zhenya_Brichkova · May 27, 2020, 3:01pm

The layers of my network are of the form:

        self.layer11 = nn.Sequential(
            nn.Conv2d(in_channels=1024, out_channels=1024, kernel_size=1),
            nn.BatchNorm2d(1024, track_running_stats=False),
            nn.ReLU()
        )

with some max pooling layers.

My forward function looks like this:

    def forward(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        ...
        x = self.layern(x)
        return x

albanD · May 27, 2020, 3:45pm

That seems to work fine for me:

import torch
from torch import nn

model = nn.Sequential(
    nn.Conv2d(in_channels=1024, out_channels=1024, kernel_size=1),
    nn.BatchNorm2d(1024, track_running_stats=False),
    nn.ReLU()
)

inp = torch.rand(1, 1024, 5, 5)

model(inp).sum().backward()

Zhenya_Brichkova · May 27, 2020, 4:23pm

This is my learning part:

for bag in d_bags:
                    loss = - self.loss(d_bags[bag], b_bags[bag])
                    loss.backward(retain_graph=True)
                    opt.step()

It successfully performs the first iteration and then fails on loss.backward. Without opt.step like this:

for bag in d_bags:
                    loss = - self.loss(d_bags[bag], b_bags[bag])
                    loss.backward(retain_graph=True)

It does not fail.

albanD · May 27, 2020, 5:18pm

Ho interesting.
Then the issue is with the fact that the opt.step() is modifying inplace the weights of the batchnorm (for the affine part).
And in the second iteration, the original weights are still needed to compute the backward hence the error you see.

You can move the step() outside of the for loop to accumulate all the gradients before doing the backward.
Or even better, accumulate the loss inside the for-loop and do a single backward call outside on the accumulated loss.

Zhenya_Brichkova · May 27, 2020, 6:02pm

I did the following and now it got me totally lost:
I moved the backward, step outside the loop as you said

loss = 0
for bag in d_bags:
      loss = loss - self.loss(d_bags[bag], b_bags[bag])
loss.backward()
opt.step()

It was giving the same error.
Then I tried to remove the batchnorm layer totally. With the anomaly flag off it is giving still the same error of inplaced operations:

RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time. (apply at ../torch/csrc/autograd/generated/Functions.cpp:3969)

Then I turned the anomaly flag on torch.autograd.set_detect_anomaly(True) and it gave a different error:

RuntimeError: Function 'MulBackward0' returned nan values in its 1th output.

albanD · May 27, 2020, 6:30pm

With the anomaly flag off it is giving still the same error of inplaced operations:

The error you linked is not the same actually !
Try adding a retain_graph=True to see if it is better.

Then I turned the anomaly flag on torch.autograd.set_detect_anomaly(True) and it gave a different error:

This can happen for few reasons but most likely here because you multiply something infinite with 0. Do you get infinite values in the forward for the mul that is pointed to by anomaly mode?