The inplace operation of ReLU

ForeverZH0204 · March 25, 2019, 6:52am

In the case that we don’t need to store the input and do some other inplace operataion are
the torch.nn.ReLU(inplace=True) and torch.nn.ReLU() have the same effect?

Oli · March 25, 2019, 4:11pm

Yeah could be cool if someone answered this I was under the assumption that whenever we use the inplace operation we can’t backpropagate.

But I guess that if you don’t need to train you might make it quicker with the inplace.

ForeverZH0204 · March 26, 2019, 1:36am

Are you serious?
Do you have any proof for your assumption?

ForeverZH0204 · March 26, 2019, 1:41am

In the code of ResNet in torchvision, we use the in-place relu, so i think what you say cannot be right

github.com

pytorch/vision/blob/3942b192e33dd79b6d9770149371bd58a483d47b/torchvision/models/resnet.py#L37






class BasicBlock(nn.Module):
    expansion = 1


    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(BasicBlock, self).__init__()
        # Both self.conv1 and self.downsample layers downsample the input when stride != 1
        self.conv1 = conv3x3(inplanes, planes, stride)
        self.bn1 = nn.BatchNorm2d(planes)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(planes, planes)
        self.bn2 = nn.BatchNorm2d(planes)
        self.downsample = downsample
        self.stride = stride


    def forward(self, x):
        identity = x


        out = self.conv1(x)
        out = self.bn1(out)

moskomule · March 26, 2019, 6:27am

Hi, this is a very simple case, though… Here, relu_() is the replace version of relu()

In [5]: a = torch.randn(128, 3, 512, 512).cuda()

In [6]: torch.cuda.max_memory_allocated()
Out[6]: 402653184

In [7]: b = a.relu()

In [8]: torch.cuda.max_memory_allocated()
Out[8]: 805306368

In [9]: c = b.relu_()

In [10]: torch.cuda.max_memory_allocated()
Out[10]: 805306368

ForeverZH0204 · March 26, 2019, 7:10am

i think u don’t understand what i say
I just wonder when the two part of a model have different effect not the two operation
whatever, thanks 4 your reply

Oli · March 26, 2019, 7:17am

Welp, just trying to add to the discussion here. Found this in the docs

In-place operations can potentially overwrite values required to compute gradients.

I still would love to know if it offers a speed up to use inplace for inference, or in layers without weights. Or could it lower the memory footprint?

ForeverZH0204 · March 26, 2019, 7:23am

thanks
but how to consider the code in the torchvision model
is it with-bug?

Oli · March 26, 2019, 7:28am

I don’t believe there is a bug. I believe that some operations can’t be used in-place but the ReLu seems like it can. If the in-place doesn’t screw with autograd for ReLu perhaps the benefit is that it’s slightly faster / less memory since we don’t have to allocate new memory for it. This is just speculation and I’d love to hear from someone more knowledgable.

I for one am going to start using the in-place ReLu though

Edit: I’ll do a quick experiment when I get to work. -> Okay so I got a 100mb out of 8gb improvement when I used the in-place in my yolov3 model during training. My gpu is gapped at 8gb if anyone is interested

import torch
import torch.nn as nn

x = torch.randn(16, 3, 512, 512).cuda()

conv = nn.Conv2d(3, 32, 3).cuda()
act = nn.ReLU() -> 1115951104 in memory
# act = nn.ReLU(inplace=False) -> 1115951104 in memory
# act = nn.ReLU(inplace=True) -> 584716288 in memory

x = act(conv(x))
print(torch.cuda.max_memory_allocated())

ForeverZH0204 · March 30, 2019, 6:39am

I found some similar question

So can we always use the out-place version for safety?