I have a model (let us call it
NN_Model) that is made of two submodules let’s say
B. The input first goes through A then through B and at last a couple of conv layers(2 to be precise).
output = A(input) output = B(output) output = conv(output) return output
When I create an instance of model
A and run the forward, it takes around 7.4GB on the GPU. Similarly, model B alone takes around 5GB on the GPU. The last convolutional layers do not take 700MB separately.
However when I try to run the forward pass of the ‘NN_Model’, pytorch tries to allocate 95GiB on the GPU with the following error message -
RuntimeError: CUDA out of memory. Tried to allocate 95.37 GiB (GPU 0; 7.80 GiB total capacity; 4.25 GiB already allocated; 2.59 GiB free; 4.31 GiB reserved in total by PyTorch)
I am not able to understand the sudden surge in memory required. I expected a total of 15GB required, give or take a GB but 95 is a lot.
Could anyone please tell me the possible reasons. The submodule
A is a Resnet50 from torchvision and the submodule
B contains 4 softmax’s as their learnable parameter and nothing else. The forward function mainly contains
bmm calls which I suppose cannot add to the memory .
Could anyone please help me out me the possible reasons for the above aberration?
Edit - I noticed that 95GiB was trying to get allocated in the forward of the
B(which works perfectly when run separately). what could be going wrong ?