What's the difference between nn.ReLU() and nn.ReLU(inplace=True)?

I implemented generative adversarial network using both nn.ReLU() and nn.ReLU(inplace=True). It seems that nn.ReLU(inplace=True) saved very small amount of memory.

What’s the purpose of the using inplace=True?
Is the behavior different in backpropagation?

31 Likes

inplace=True means that it will modify the input directly, without allocating any additional output. It can sometimes slightly decrease the memory usage, but may not always be a valid operation (because the original input is destroyed). However, if you don’t see an error, it means that your use case is valid.

78 Likes

In this http://pytorch.org/docs/master/notes/autograd.html#in-place-operations-on-variables document, in place operation is not encouraged. But why did most official example in torchvision (e.g. Resnet) use nn.ReLU(inplace=True).

Also, does
x = self.conv1(x)
x = self.conv2(x)
be considered as in place operation (as they use the same variable name) or not?

6 Likes

x = self.conv1(x)
x = self.conv2(x)

is not an in place operation, because you use the same variable name, but it’s not the same variable underneath. You just point your x name to a new variable, the old one is still in memory (because it’s referenced by the pytorch graph)

14 Likes

Thanks! It makes sense!
But as inplace operation is not encouraged, why most official examples use nn.ReLU(inplace=True)?

7 Likes

I’m a newbie of pytorch. So I wonder whether nn.ReLI(inplace=True) would do harm to backprop? And what about F.ReLU(inplace)?

6 Likes

That’s a good question! I think that this was an initial limitation based on the PyTorch (whitepaper) manuscript on arxiv. But based on what I’ve seen (e.g., by @soumith dcgan.py · GitHub) it seems that inplace ops like nn.ReLU(inplace=True) are supported in the autodiff engine now. Not sure, but I guess the same should be true for the functional one as it is referencing the nn. one.

5 Likes

@cdancette In that case, the relu(inplace=True) in vision/resnet.py is actually not inplace since it is used with x = relu(x) in forward()?

For relu, when input is negative, both the grad and output should be zero, grads should stop propagating from there, so inplace doesn’t hurt anything while saves memory.

12 Likes

Is this an in-place operation?

b = torch.tensor(5)
y = torch.sigmoid_(torch.tensor(4)) & y = torch.sigmoid(b)

Thanks!

Even if you use x = relu(x), it is still inplace, reassining x to the output of relu(x) does nothing here.

You can check :after relu(x) and x = relu(x), x has the same value in both cases.

torch.sigmoid_ is an inplace operation

torch.sigmoid is not.

You can check on pytorch:

>>> a = torch.tensor(1.0)
>>> torch.sigmoid(a)
tensor(0.7311)
>>> print(a)
tensor(1.0)
>>> a = torch.tensor(1.0)
>>> torch.sigmoid_(a)
tensor(0.7311)
>>> print(a)
tensor(0.7311)

In the first case, a still has its original value, while in the second case, a is different.

6 Likes

In case y = F.relu(x, inplace=True), it won’t hurt anything if value of x should always be positive in your computational graph. However, some other node that shares x as input while it requires x has both positive and negative value, then your network may malfunction.

For example, in the following situation,

y = F.relu(x, inplace=True) (1)
z = network(x) (2)

If (1) is declared first and execuated first, then value of x is changed, then z may have incorrect expected value.

1 Like

Hi all,

Even though there are multiple answers, I will explain my problem here. I am facing the following error message:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [4, 64, 3, 3]] is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

It occurs when trying to compute the gradients for the second backward pass. I have the suspicion it has something to do with the fact that I am keeping state variables of an LSTM feature map creator as self objects inside an nn.Module. Here is the code snippet of my forward loop:

    def forward(self, z, xlr=None, logdet=0, logpz=0, eps=None, reverse=False,
                use_stored=False):

        self.h_new, self.c_next = self.conv_lstm(z, (self.h, self.c))

        # Encode
        if not reverse:
            for i in range(self.L):
                print("Level", i)
                for layer in self.level_modules[i]:

                    if isinstance(layer, modules.Squeeze):
                        z = layer(z, reverse=False)
                        self.h_new = layer(self.h_new, reverse=False)

                    elif isinstance(layer, FlowStep):
                        z, logdet = layer(z, lr_feat_map=self.h_new, # lr_downsampled_feats[i + 1], # TODO: change this part
                                          x_lr=xlr, logdet=logdet, reverse=False)

                    elif isinstance(layer, modules.GaussianPrior):
                        z, logdet, logpz = layer(z, logdet=logdet, logpz=logpz,
                                                 lr_feat_map=self.h_new, #lr_downsampled_feats[i + 1],
                                                 eps=eps, reverse=False)
                        self.h = self.last_squeezer(self.h_new, reverse=True)
                        self.c = self.c_next

Do you think it would be the best to just pass on the hidden and context states through the function outputs? I have already set loss.mean().backward(retain_graph=True) and skimmed the code for other inplace operations.

Any help would be much appreciated !! Please let me know if further code snippets are required.