Why would nn.ReLU cause my model to go over memory?

Alexander_Soare · September 9, 2020, 4:36pm

I have a model which looks like this:

class PFNet(nn.Module):
    def __init__(self):
        super(PFNet, self).__init__()
        self.conv1 = nn.Conv3d(1, 16, 3)
        self.conv2 = nn.Conv3d(16, 32, 3)
        self.conv3 = nn.Conv3d(32, 96, 2)
        self.conv4 = nn.Conv3d(96, 1, 1)
        self.pool1 = nn.MaxPool3d(kernel_size=2, stride=2)
        self.pool2 = nn.MaxPool3d(kernel_size=3, stride=3)
        self.pool3 = nn.MaxPool3d(kernel_size=2, stride=2)
        self.pool4 = nn.MaxPool3d(kernel_size=2, stride=2)
        self.relu = nn.ReLU()
        self.fc1 = nn.Linear(400, 1)

    def forward(self, x):
        x = self.conv1(x)
        x = self.relu(x)
        x = self.pool1(x)

        x = self.conv2(x)
        x = self.relu(x)
        x = self.pool2(x)

        x = self.conv3(x)
        x = self.relu(x)
        x = self.pool3(x)

        x = self.conv4(x)
        x = self.relu(x)
        x = self.pool4(x)
        x = x.view(-1, 400)
        x = self.fc1(x)

        return x

Before I added in the ReLUs it was working fine on a batch size of 8. Then with the ReLUs I get

CUDA out of memory. Tried to allocate 1.04 GiB (GPU 0; 15.75 GiB total capacity; 13.68 GiB already allocated; 848.88 MiB free; 13.76 GiB reserved in total by PyTorch)

ReLU being literally something like np.max(tensor, 0), I’d think that this shouldn’t have any impact on the memory.

Nikronic · September 9, 2020, 4:42pm

Hi,

Relu by default allocate new memory for output. You can modify input directly by setting inplace=True flag. Although, I am not sure this is the only reason or not but for sure Relu will consume memory.

Bests

Alexander_Soare · September 9, 2020, 4:46pm

Thanks. Good answer but unfortunately, as you warned, no cigar. Still running into the memory issue.

… although, I was able to up my batch size from 4 to 6 with this change. But still not the 8 I had originally.

Nikronic · September 9, 2020, 5:18pm

If the only change was introducing Relu, I cannot really figure it out at the moment.
One proper way to debug such issues is that to use profiler. It helps you to find bottlenecks even when no error/warning is happening so you can optimize your code much more.
Sorry for the lack of knowledge.
https://pytorch.org/tutorials/recipes/recipes/profiler.html

singleroc · April 5, 2021, 8:02am

Hi,

Based on my own understanding of graphic mode, the ReLU will introduce extra memory when training even the inplace flag is set.

The inplace flag only indicates the output resue the input memory in the forward round. However, it need to save extra information for the latter backward propogation. It either saves the input tensor X or positions where X>=0. I do not know where it happens, but I believe the saving for backward is indeed required.

If only in inference mode, I think ReLU(True) will not introduce extra memory.

Rui_Wang · September 13, 2023, 9:10am

Hi,

This maybe 2 years too late, but you could switch the order of the maxpooling and relu and set relu to inplace to make it work I believe.