Adding autoencoders to large network causes memory issues

Hrrsmjd · May 11, 2023, 3:30pm

I’m currently working with a large neural network with over 35 million parameters. When I load this model onto my GPU (A100 with 40GB), it takes up around 1GB of GPU memory. With a batch size of 4, I’m able to train the network without any issues.

However, when I add four relatively small autoencoders to my network, I run into memory issues. Even with a batch size of 1, I still run out of GPU memory. The autoencoders themselves have a relatively small number of parameters, so I’m not sure why this is happening.

Example: With a batch size of 1, I run out of GPU memory with four of the following encoders (and matching decoders):

self.encoder = nn.Sequential(
            nn.Conv3d(64, 64, kernel_size=(1, 1, 5)),
            nn.GELU(),
            nn.Conv3d(64, 64, kernel_size=(1, 1, 5)),
            nn.GELU(),
            nn.Conv3d(64, 64, kernel_size=(1, 1, 5)),
            nn.GELU(),
            nn.Conv3d(64, 64, kernel_size=(1, 1, 5)),
            nn.GELU(),
            nn.Conv3d(64, 64, kernel_size=(1, 1, 5)),
            nn.GELU(),
            nn.Conv3d(64, 64, kernel_size=(1, 1, 5)),
            nn.GELU(),
            nn.Conv3d(64, 64, kernel_size=(1, 1, 5)),
            nn.GELU(),
            nn.Conv3d(64, 64, kernel_size=(1, 1, 5)),
            nn.GELU(),
            nn.Conv3d(64, 64, kernel_size=(1, 1, 5)),
            nn.GELU(),
            nn.Conv3d(64, 64, kernel_size=(1, 1, 5)),
            nn.GELU(),
            nn.Conv3d(64, 64, kernel_size=(1, 1, 5)),
            nn.GELU(),
            nn.Conv3d(64, 64, kernel_size=(1, 1, 5)),
            nn.GELU(),
            nn.Conv3d(64, 64, kernel_size=(1, 1, 5)),
            nn.GELU(),
            nn.Conv3d(64, 64, kernel_size=(1, 1, 5)),
            nn.GELU(),
            nn.Conv3d(64, 64, kernel_size=(1, 1, 5)),
            nn.GELU(),
            nn.Conv3d(64, 64, kernel_size=(1, 1, 4))
        )

However, I only use 25GB of GPU memory if I use the following encoders (and matching decoders):

self.encoder = nn.Sequential(
            nn.Conv3d(64, 64, kernel_size=(1, 1, 17)),
            nn.GELU(),
            nn.Conv3d(64, 64, kernel_size=(1, 1, 17)),
            nn.GELU(),
            nn.Conv3d(64, 64, kernel_size=(1, 1, 17)),
            nn.GELU(),
            nn.Conv3d(64, 64, kernel_size=(1, 1, 16))
        )

So adding a few Conv3d layers is somehow using at least 15GB of GPU memory.

Does anyone have any ideas as to why adding these autoencoders is causing memory issues? And is there any way to mitigate this problem? I’m open to any suggestions or advice.

Thank you in advance!

Edit:
Another example:
When I load a model into GPU it uses ~1GB GPU memory and one data point is also ~1GB. By default, the model has 8.9 million parameters and uses 7.4GB GPU memory (when training), but if I change

self.encoder = nn.Sequential(
            nn.Conv3d(64, 64, kernel_size=(1, 1, 64))
        )

to

self.encoder = nn.Sequential(
            nn.Conv3d(64, 64, kernel_size=(1, 1, 7)),
            nn.GELU(),
            nn.Conv3d(64, 64, kernel_size=(1, 1, 7)),
            nn.GELU(),
            nn.Conv3d(64, 64, kernel_size=(1, 1, 7)),
            nn.GELU(),
            nn.Conv3d(64, 64, kernel_size=(1, 1, 7)),
            nn.GELU(),
            nn.Conv3d(64, 64, kernel_size=(1, 1, 7)),
            nn.GELU(),
            nn.Conv3d(64, 64, kernel_size=(1, 1, 7)),
            nn.GELU(),
            nn.Conv3d(64, 64, kernel_size=(1, 1, 7)),
            nn.GELU(),
            nn.Conv3d(64, 64, kernel_size=(1, 1, 7)),
            nn.GELU(),
            nn.Conv3d(64, 64, kernel_size=(1, 1, 7)),
            nn.GELU(),
            nn.Conv3d(64, 64, kernel_size=(1, 1, 7)),
            nn.GELU(),
            nn.Conv3d(64, 64, kernel_size=(1, 1, 4))
        )

the model has 9 million parameters and uses 25GB GPU memory.

ptrblck · May 17, 2023, 6:06am

This would be expected depending on the input activation size, as the intermediate forward activations could use the majority of the memory. You could check this post for an estimation of a ResNet architecture.