RuntimeError: CUDA out of memory saving model predictions

Hi,

I am looking for saving model predictions and later using them for calculating accuracy. The dataset has 20000 samples, I was trying to use

prediction_list.append(prediction)

And then using torch.save to save them. But this gives this error:

RuntimeError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 31.72 GiB total capacity; 30.73 GiB already allocated; 6.12 MiB free; 19.83 MiB cached)

Is there any methods to save them, not facing this error?

If you are not running the code in a with torch.no_grad() block, you will store the whole computation graph in the list for each prediction.
Use prediction_list.append(prediction.detach()) to store the tensor only (and use the no_grad() guard to save more memory, if you don’t need to calculate the gradients later).

1 Like

Thank you very much, @ptrblck. It worked.

I think I am missing something in the saving and loading files by torch. I saved prediction, labels, and later I called torch.load to retrieve these values,

prediction_load = torch.load(ME_DIR + 'prediction' + '.torch')         
labels_load = torch.load(ME_DIR + 'labels' + '.torch')

for i in range(len(prediction_load)):
    _, predicted = torch.max(prediction_load[i].data, 0)
    correct += predicted.eq(labels_load[i].data).cpu().sum().  ###error in this line
 
    total += labels_load[i].size(0)


accuracy = 100.*correct.float()/total

But it gives this error in the line for calculating correct :

RuntimeError: The size of tensor a (10) must match the size of tensor b (2) at non-singleton dimension 1.

torch.max(tensor, 0) will apply the max operation in the batch dimension.
If your output has the shape [batch_size, nb_classes], you should use dim=1.

1 Like

Ok. Thanks, @ptrblck.

for epoch in range(150):
    train, valid = random_split(train_nn, [850000, 50000])
    
    
    trainloader = DataLoader(train, batch_size=BATCH_SIZE)
    validloader = DataLoader(valid, batch_size=BATCH_SIZE) 
    
    for i in range(len(train_nn)//BATCH_SIZE):
            train_data = next(iter(trainloader))
            validation_data = next(iter(validloader))
            
            train_x = train_data[:,0:-1]
            train_y = train_data[:,-1]
            
            validation_x = validation_data[:,0:-1]
            validation_y = validation_data[:,-1]


            y_hat = model(train_x).reshape((BATCH_SIZE,))

            cost = LOSS(y_hat, train_y)

            cost.backward()

            OPTIMIZER.step()
            OPTIMIZER.zero_grad()
            with torch.no_grad():
                roc_auc_values_validation.append(roc_auc_score(validation_y.cpu(), model(validation_x).detach().cpu().numpy()))

                roc_auc_values_train.append(roc_auc_score(train_y.cpu(), model(train_x).detach().cpu().numpy()))

    with torch.no_grad():
            y_sub.append(model(test_nn).detach().cpu().numpy())
        
    torch.cuda.empty_cache()
    print(f'Epoch: {epoch+1}:',f"cost: {cost}")
    print(f"auc under the ROC curve for the validation set is: {roc_auc_values_validation[-1]}")
    print(f"auc under the ROC curve for the training set is: {roc_auc_values_train[-1]}")
            

I have tried what you said but nothing is working

I assume you are observing an increasing memory usage and tried to fix this by detaching tensors?
If so, could you post a minimal, executable code snippet reproducing the issue, please?