Hello, I am trying to use a trained model to make predictions (batch size of 10) on a test dataset, but my GPU quickly runs out of memory. I think it’s because some unneeded variables/tensors are being held in the GPU, but I am not sure how to free them. I was able to find some forum posts about freeing the total GPU cache, but not something about how to free specific memory used by certain variables. My apologies if this has been asked and resolved before. How should I free the tensors after each iteration in my code below? I am storing the outputs in a list, so once outputs are produced on each iteration they do not need to be in the GPU anymore:
'''
STEP 5: USE THE MODEL TO MAKE PREDICTIONS
'''
# Note: my batches are 10 images now, so I need to stack 10 before appending to output list
model_outputs = []
# Get test examples
for x in range(len(test_dataset) - 10):
batch = torch.stack((test_dataset[x][0].float().cuda()
, test_dataset[x + 1][0].float().cuda()
, test_dataset[x + 2][0].float().cuda()
, test_dataset[x + 3][0].float().cuda()
, test_dataset[x + 4][0].float().cuda()
, test_dataset[x + 5][0].float().cuda()
, test_dataset[x + 6][0].float().cuda()
, test_dataset[x + 7][0].float().cuda()
, test_dataset[x + 8][0].float().cuda()
, test_dataset[x + 9][0].float().cuda())).unsqueeze(1) #makes batch of size 10
outputs = model(batch) #passes batch into model
_, predicted = torch.max(outputs.data, 1) #gets the predicted values
model_outputs.append((predicted[0].cpu(), test_dataset[x][2]))
model_outputs.append((predicted[1].cpu(), test_dataset[x + 1][2]))
model_outputs.append((predicted[2].cpu(), test_dataset[x + 2][2]))
model_outputs.append((predicted[3].cpu(), test_dataset[x + 3][2]))
model_outputs.append((predicted[4].cpu(), test_dataset[x + 4][2]))
model_outputs.append((predicted[5].cpu(), test_dataset[x + 5][2]))
model_outputs.append((predicted[6].cpu(), test_dataset[x + 6][2]))
model_outputs.append((predicted[7].cpu(), test_dataset[x + 7][2]))
model_outputs.append((predicted[8].cpu(), test_dataset[x + 8][2]))
model_outputs.append((predicted[9].cpu(), test_dataset[x + 9][2]))
...
RuntimeError: CUDA out of memory. Tried to allocate 80.00 MiB (GPU 0; 11.00 GiB total capacity; 4.22 GiB already allocated; 14.30 MiB free; 83.38 MiB cached)