Cuda out of memory error for Conv Autoencoder

Hi - I’m working on a convolutional autoencoder model. The model has two conv layers, one linear bottleneck layer, two deconv layers. I’m also using max pooling and max unpooling layers in encoder and decoder correspondingly. I’m working on MNIST with mini batch size 512. My model errored out after 10 epochs due to memory issue.
“RuntimeError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 15.78 GiB total capacity; 11.70 GiB already allocated; 2.50 MiB free; 412.06 MiB cached)”.

Then I tried to run with smaller mini batch size but I keep getting the error. I’m using Torch’s data loader tool . Does PyTorch load the full training set on GPU and then run training on mini batches? Is it possible to load only a mini batch at a time and process? I know that it’ll be slow. I also noticed that I get this issue if I use the max unpooling operation. It requires the max-pooled indices from the forward pass which requires additional memory.

Hi,there.From my perspective,reducing batch size is always useful when running into out of memory error. How about set batch size to 64 or smaller(32). It seems that pytorch load data at a time.

@Pengzhangzhi - I did try mini batch size 128,64 and 32 but I got the same error.

What’s your IDE? Do you use jupyter or Pycharm?

I use Sublime Text and Geany. Why?

Hello, could you try the following,

  1. Shift just the model to GPU and see an increase in memory,
  2. Shift just one batch of data to the GPU and see the increase in memory,
  3. Now shift both model as well as your data to GPU (Note that the Cuda driver occupies an extra of around 2GB(at least on my system, I do not whether its universally same or not))
    Do not perform forward or backward pass.

It might be that your data is too large, or your model is large. If none is the issue then we would have to dig down further

Hi @a_d - I only run into this memory issue if I use max unpooling in the decoder. Unpooling requires the index set from the max pooling operation in the encoder.
I was able to run the same code without max unpooling.
I just checked, if I don’t use max unpooling the memory usage of Tesla V100 GPU raise up to 1.5 GB, but using max unpooling eats up the entire 16G memory.
| 1 Tesla V100-PCIE… Off | 00000000:D8:00.0 Off | 0 |
| N/A 46C P0 99W / 250W | 1443MiB / 16160MiB | 84% Default |
| | | N/A |

Can anyone please help me here? I see that using Maxunpooling makes my code out of memory error.
Thank you,

Could you post a minimal code snippet, which shows the unexpected memory usage in nn.MaxUnpoolXd?

Could you check with your Dataset and Dataloaders. Might the problem is you are trying to stack all the samples in getitem.