You are running out of memory, so you would need to reduce the batch size of the overall model architecture. Note that your GPU has 2GB, which would limit the executable workloads on this device.
You could also try to use torch.utils.checkpoints to trade compute for memory.
reducing to smallest batch_size =2 still didnt worked. Giving error, RuntimeError: CUDA out of memory. Tried to allocate 144.00 MiB (GPU 0; 2.00 GiB total capacity; 1.01 GiB already allocated; 105.76 MiB free; 1.05 GiB reserved in total by PyTorch)
I tried to do restart and things, but it dont worked.
when using without cuda, notebook freezes on running both locally and in colab.
Oh it might be problem in my implementation, pretrained network using cuda working.
It could be that your GPU is just too small for the job you’re trying to do. Perhaps use Colab to train (free) and then your GPU for finetune/inference?
I think i have a similar issue. Model is a BiLSTM+CRF. Random spiking of GPU memory usage and then RuntimeError: CUDA out of memory. Larger batch size worked fine. Smaller batch size worked fine once and couple of other times it ended in runtime error.
All experiments have same parameters except the following:
Light blue - batch size 128
All others - batch size 32
Have a look at this memory profiler/monitor if you’re running in a jupyter notebook - https://github.com/stas00/ipyexperiments - it might help you to identify where you lose that memory.
Since you are running out of memory, you would need to lower the batch size or you could have a look at torch.utils.checkpoint to trade compute for memory.
Also, if not already done, wrap the validation loop in a with torch.no_grad() block, and avoid storing tensors, which are not detached from the computation graph.
@ptrblck @mikey_t: Did you solve your problem?
I have the same issue. RuntimeError: CUDA out of memory. Tried to allocate 82.00 MiB (GPU 0; 15.78 GiB total capacity; 14.60 GiB already allocated; 15.44 MiB free; 14.70 GiB reserved in total by PyTorch)
Before starting the training, nvidia-smi says 0MB is used and no processes are running. I am running it in one Tesla V100-SXM2 GPU.
My batch size is 1 which is approximately 150 images. I feed it to a pre-trained resnet18 pytorch model whose output embedding is fed to a transformer encoder and then finally to a CTC loss function. The total params of the model is Total params: 37818496
The fail trace shows out of memory in resnet forward pass: File "/home/####/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 342, in conv2d_forward self.padding, self.dilation, self.groups)
@add023 I think I solved this by setting batch_size to 2 (even though it’s larger, it worked for me for some reason). I also ran the model in parallel on my GPUs.
I got the similar problem. My data only includes around 1k imgaes in 143*183 resolution. I set batch size to 32. I’m using ResNet34. And I still got the error. I set ‘Xmx to 2048m’. That’s so weird.
RuntimeError: CUDA out of memory. Tried to allocate 40.00 MiB (GPU 0; 8.00 GiB total capacity; 6.09 GiB already allocated; 39.75 MiB free; 6.28 GiB reserved in total by PyTorch)
The available 8GB might not be enough to run the model in this setup as the error message indicates that <40MB are free on the device.
Did you make sure that the GPU is completely empty via e.g. nvidia-smi before starting the training?