Memory overflow issue in CUDA when trying to make a prediction for a trained model

Faced the following error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-39-94bd54aad11f> in <module>()
      4 # get predictions for test data
      5 with torch.no_grad():
----> 6   preds = model(test_seq.to(device), test_mask.to(device))
      7   preds = preds.detach().cpu().numpy()

8 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
   1914         # remove once script supports set_grad_enabled
   1915         _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 1916     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
   1917 
   1918 

RuntimeError: CUDA out of memory. Tried to allocate 11.22 GiB (GPU 0; 15.78 GiB total capacity; 12.10 GiB already allocated; 2.18 GiB free; 12.30 GiB reserved in total by PyTorch)

The prediction code is as follows:

# specify GPU
device = torch.device("cuda")

# get predictions for test data
with torch.no_grad():
  preds = model(test_seq.to(device), test_mask.to(device))
  preds = preds.detach().cpu().numpy()

What to do?

You could try to reduce the memory usage by reducing e.g. the batch size. I also assume you’ve made sure the GPU is completely empty before running the script and no other process is using it.