If you are not running the code in a with torch.no_grad() block, you will store the whole computation graph in the list for each prediction.
Use prediction_list.append(prediction.detach()) to store the tensor only (and use the no_grad() guard to save more memory, if you don’t need to calculate the gradients later).

I think I am missing something in the saving and loading files by torch. I saved prediction, labels, and later I called torch.load to retrieve these values,

prediction_load = torch.load(ME_DIR + 'prediction' + '.torch')
labels_load = torch.load(ME_DIR + 'labels' + '.torch')
for i in range(len(prediction_load)):
_, predicted = torch.max(prediction_load[i].data, 0)
correct += predicted.eq(labels_load[i].data).cpu().sum(). ###error in this line
total += labels_load[i].size(0)
accuracy = 100.*correct.float()/total

But it gives this error in the line for calculating correct :

RuntimeError: The size of tensor a (10) must match the size of tensor b (2) at non-singleton dimension 1.

for epoch in range(150):
train, valid = random_split(train_nn, [850000, 50000])
trainloader = DataLoader(train, batch_size=BATCH_SIZE)
validloader = DataLoader(valid, batch_size=BATCH_SIZE)
for i in range(len(train_nn)//BATCH_SIZE):
train_data = next(iter(trainloader))
validation_data = next(iter(validloader))
train_x = train_data[:,0:-1]
train_y = train_data[:,-1]
validation_x = validation_data[:,0:-1]
validation_y = validation_data[:,-1]
y_hat = model(train_x).reshape((BATCH_SIZE,))
cost = LOSS(y_hat, train_y)
cost.backward()
OPTIMIZER.step()
OPTIMIZER.zero_grad()
with torch.no_grad():
roc_auc_values_validation.append(roc_auc_score(validation_y.cpu(), model(validation_x).detach().cpu().numpy()))
roc_auc_values_train.append(roc_auc_score(train_y.cpu(), model(train_x).detach().cpu().numpy()))
with torch.no_grad():
y_sub.append(model(test_nn).detach().cpu().numpy())
torch.cuda.empty_cache()
print(f'Epoch: {epoch+1}:',f"cost: {cost}")
print(f"auc under the ROC curve for the validation set is: {roc_auc_values_validation[-1]}")
print(f"auc under the ROC curve for the training set is: {roc_auc_values_train[-1]}")

I assume you are observing an increasing memory usage and tried to fix this by detaching tensors?
If so, could you post a minimal, executable code snippet reproducing the issue, please?