Why would nn.ReLU cause my model to go over memory?

I have a model which looks like this:

class PFNet(nn.Module):
    def __init__(self):
        super(PFNet, self).__init__()
        self.conv1 = nn.Conv3d(1, 16, 3)
        self.conv2 = nn.Conv3d(16, 32, 3)
        self.conv3 = nn.Conv3d(32, 96, 2)
        self.conv4 = nn.Conv3d(96, 1, 1)
        self.pool1 = nn.MaxPool3d(kernel_size=2, stride=2)
        self.pool2 = nn.MaxPool3d(kernel_size=3, stride=3)
        self.pool3 = nn.MaxPool3d(kernel_size=2, stride=2)
        self.pool4 = nn.MaxPool3d(kernel_size=2, stride=2)
        self.relu = nn.ReLU()
        self.fc1 = nn.Linear(400, 1)

    def forward(self, x):
        x = self.conv1(x)
        x = self.relu(x)
        x = self.pool1(x)

        x = self.conv2(x)
        x = self.relu(x)
        x = self.pool2(x)

        x = self.conv3(x)
        x = self.relu(x)
        x = self.pool3(x)

        x = self.conv4(x)
        x = self.relu(x)
        x = self.pool4(x)
        x = x.view(-1, 400)
        x = self.fc1(x)

        return x

Before I added in the ReLUs it was working fine on a batch size of 8. Then with the ReLUs I get

CUDA out of memory. Tried to allocate 1.04 GiB (GPU 0; 15.75 GiB total capacity; 13.68 GiB already allocated; 848.88 MiB free; 13.76 GiB reserved in total by PyTorch)

ReLU being literally something like np.max(tensor, 0), I’d think that this shouldn’t have any impact on the memory.


Relu by default allocate new memory for output. You can modify input directly by setting inplace=True flag. Although, I am not sure this is the only reason or not but for sure Relu will consume memory.


1 Like

Thanks. Good answer but unfortunately, as you warned, no cigar. Still running into the memory issue.

… although, I was able to up my batch size from 4 to 6 with this change. But still not the 8 I had originally.

If the only change was introducing Relu, I cannot really figure it out at the moment.
One proper way to debug such issues is that to use profiler. It helps you to find bottlenecks even when no error/warning is happening so you can optimize your code much more.
Sorry for the lack of knowledge.

1 Like


Based on my own understanding of graphic mode, the ReLU will introduce extra memory when training even the inplace flag is set.

The inplace flag only indicates the output resue the input memory in the forward round. However, it need to save extra information for the latter backward propogation. It either saves the input tensor X or positions where X>=0. I do not know where it happens, but I believe the saving for backward is indeed required.

If only in inference mode, I think ReLU(True) will not introduce extra memory.