Batch size - GPU - out of memory


I have an image classification problem that I have implemented in PyTorch. The code works perfectly when I have a batch size of 10. Any more than that, I get ‘CUDA out of memory’ error.

The images are 4-channeled and I use a DenseNet architecture. However, when I use only 1 channel (of the 4) for training (with a DenseNet that takes 1 channel images), I expected I could go up to a batch size of 40. But unfortunately, I get the out of memory error for any batch size above 10.

Can anyone please explain the logic behind this? I’m confused at this point, any inputs would be greatly appreciated.

Many thanks in advance!

No, that would be a wrong expectation, since you would only save memory in the input tensor as well as the kernel size of the first conv layer.
E.g. for an input size of [10, 4, 224, 224] you would decrease the memory usage as:

x = torch.randn(10, 4, 224, 224)
print('{:.3f}MB'.format(x.nelement() * 4 / 1024**2))
> 7.656MB
print('{:.3f}MB'.format(x[:, 0].nelement() * 4 / 1024**2))
> 1.914MB

and the kernel size would also drop by 75%, but would potentially save even less memory.
The first conv layer creates the activation outputs and in particular the channel dimension is defined by the number of kernels used in it. This would mean that no memory savings are expected after the first conv layer.

1 Like

Oh I see, thanks for clarifying. You are awesome!