I fine-tuned the PEGASUS model for abstractive Summarization on a virtual environment but the model had some problems so I created a new virtual environment to run the model again but the following error keeps popping

111495 · March 31, 2021, 10:44pm

RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 2.00 GiB total capacity; 1.28 GiB already allocated; 4.55 MiB free; 1.28 GiB reserved in total by PyTorch)
The Script that I run: Pytorch script for fine-tuning Pegasus Large model · GitHub

Note: The batch size in the script is 1 and I ran the following to see more details torch.cuda.memory_summary(device=‘cuda’, abbreviated=False) and this was the output:

ptrblck · April 1, 2021, 6:39am

Could you check, if the GPU is completely empty before running the script via nvidia-smi?
2GB device memory is not a lot, but I assume you were able to run the script with a batch size of one before?

111495 · April 1, 2021, 6:25pm

Yes I was able to run the script once.
I checked the memory before running the script but with these commands:

ptrblck · April 1, 2021, 10:55pm

torch.cuda.memory_summary() will report, what PyTorch is using and not if other processes might also use device memory, so you would have to use e.g. nvidia-smi.

111495 · April 2, 2021, 3:49pm

This is the result of running nvidia-smi.

omarfoq · April 2, 2021, 4:07pm

Hello @111495,

Tt shows that you are using 1.7GB.

I think you executed your script before launchingnvidia-smi, do you confirm that ? If it’s not the case, kill the script that is consuming GPU and run your code.

111495 · April 27, 2021, 12:33am

I found out that the problem was that my GPU memory is about 2GB while the recommended memory size is 16 GB.
Upon this I subscribed to Colab Pro as it provides 16 GB of GPU memory and enough disk space to run the script.
Here is the link to the issue on github which also contain some tips were mentioned by one of the contributors that might help others0