Out of Memory issue with multi GPUs

I am new to ML, Deep Learning, and Pytorch. I am not sure why, but changing my batch size and image size has no effect whatsoever on the allocated memory

Tried to allocate 25.15 GiB (GPU 1; 47.54 GiB total capacity; 25.15 GiB already allocated; 21.61 GiB free; 25.16 GiB reserve

I am using to A6000 x 2 GPUS. I made a gist of the code, but if prefered I can post it here. Simple Distrbuted CNN · GitHub . The goal is to create a simple CNN that can detect the illuminated light on traffic lights. I feel like the training data is not being split across the GPUs but all data is being trained on both GPUs simultaneuously, can anyone provide assistance? I am open to criticism if you have any tips on the layout or functionality of the code.

Also i feel like there could be a general rule of thumb (or at least a ‘general’ range) that an image size of (x,x) and CNN with a certain amount of conv layers (or number of paramters) will require at least X amount of GB. What Ive noticed before is that I trained a Magic the Gathering Card classifier with CNN (distrbuted with the same code and it works) and I can never seem to build over 3 layers before reaching out of memory following the 32/64 conv layer input/output format and image size 72x72 which i know is extremely small

Ive used the code from the following links:

I’m seeing a few things that seem unusual. Are you trying to train separate models or a single model? For the single model case I would expect that you would want to use DDP/Distributed Data Parallel in conjunction with the distributed sampler.

For a reference implementation using the components I would recommend checking out the ImageNet example:

Additionally the model used looks highly unusual, in particular:

which is an unusually large linear layer for a classification model.

You may want to take a look at the implementations for some standard torchvision models Models and pre-trained weights — Torchvision 0.15 documentation to get a feel for typical CNN classification architectures.

Im trying to train a single model. I was trying to build a model resembling machine-learning-book/ch14_part2.ipynb at main · rasbt/machine-learning-book · GitHub which uses a 4 convolution layers to train a classifier. I see in the example that the linear layer is not as large as the model I created. I switched to using batchnorm instead of dropout, but Ill see why the linear layer is so large and go from there. Thanks for taking a look.

Thanks @eqy my FC layer was waayy too large as mentioned. I just reduced the size to some arbitrary low and I was able to get the model running again.

The stride paramter I had set in the MaxPool was creating an absurd amoutn of paramters. I will look more into how to use this layer correctly