The inplace operation of ReLU

In the case that we don’t need to store the input and do some other inplace operataion are
the torch.nn.ReLU(inplace=True) and torch.nn.ReLU() have the same effect?

2 Likes

Yeah could be cool if someone answered this :slight_smile: I was under the assumption that whenever we use the inplace operation we can’t backpropagate.

But I guess that if you don’t need to train you might make it quicker with the inplace.

Are you serious?
Do you have any proof for your assumption?

In the code of ResNet in torchvision, we use the in-place relu, so i think what you say cannot be right

Hi, this is a very simple case, though… Here, relu_() is the replace version of relu()

In [5]: a = torch.randn(128, 3, 512, 512).cuda()

In [6]: torch.cuda.max_memory_allocated()
Out[6]: 402653184

In [7]: b = a.relu()

In [8]: torch.cuda.max_memory_allocated()
Out[8]: 805306368

In [9]: c = b.relu_()

In [10]: torch.cuda.max_memory_allocated()
Out[10]: 805306368

i think u don’t understand what i say
I just wonder when the two part of a model have different effect not the two operation
whatever, thanks 4 your reply

Welp, just trying to add to the discussion here. Found this in the docs

In-place operations can potentially overwrite values required to compute gradients.

I still would love to know if it offers a speed up to use inplace for inference, or in layers without weights. Or could it lower the memory footprint?

thanks
but how to consider the code in the torchvision model
is it with-bug?

I don’t believe there is a bug. I believe that some operations can’t be used in-place but the ReLu seems like it can. If the in-place doesn’t screw with autograd for ReLu perhaps the benefit is that it’s slightly faster / less memory since we don’t have to allocate new memory for it. This is just speculation and I’d love to hear from someone more knowledgable.

I for one am going to start using the in-place ReLu though :wink:

Edit: I’ll do a quick experiment when I get to work. -> Okay so I got a 100mb out of 8gb improvement when I used the in-place in my yolov3 model during training. My gpu is gapped at 8gb if anyone is interested

import torch
import torch.nn as nn

x = torch.randn(16, 3, 512, 512).cuda()

conv = nn.Conv2d(3, 32, 3).cuda()
act = nn.ReLU() -> 1115951104 in memory
# act = nn.ReLU(inplace=False) -> 1115951104 in memory
# act = nn.ReLU(inplace=True) -> 584716288 in memory

x = act(conv(x))
print(torch.cuda.max_memory_allocated())

I found some similar question

So can we always use the out-place version for safety?

1 Like