CUDA memory keeps running out no matter how many times I decrease batch size

I have been running the following code on a batch size of 32 first then 16 then 8 and it keeps returning the same runtime error. I have a GTX1650 which has 8 GB memory. Is this not enough?

iterations = 30
trainLoss = []
testAcc = []

start = time.time()
for epoch in range(iterations):
    epochStart = time.time()
    runningLoss = 0    
    net.train(True) # For training
    for data in trainLoader:
        inputs,labels = data
        # Wrap them in Variable
        if use_gpu:
            inputs, labels = Variable(inputs.float().cuda()), \
                Variable(labels.long().cuda())
        else:
            inputs, labels = Variable(inputs), Variable(labels.long()) 
        inputs = inputs/1600
        # Initialize gradients to zero
        optimizer.zero_grad()
        # Feed-forward input data through the network
        outputs = net(inputs)
        # Compute loss/error
        loss = criterion(outputs, labels)        
        # Backpropagate loss and compute gradients
        loss.backward()
        # Update the network parameters
        optimizer.step()
        # Accumulate loss per batch
        runningLoss += loss.detach()    
    avgTrainLoss = runningLoss/1200
    trainLoss.append(avgTrainLoss)
    # Evaluating performance on test set for each epoch
    net.train(False) # For testing
    inputs = TestImages/1600
    if use_gpu:
        inputs = Variable(inputs.cuda())
        outputs = net(inputs)
        _, predicted = torch.max(outputs.data, 1)
        predicted = predicted.cpu()
    else:
        inputs = Variable(inputs)
        outputs = net(inputs)
        _, predicted = torch.max(outputs.data, 1)
    correct = 0
    total = 0
    total += TestLabels.size(0)
    correct += (predicted == TestLabels).sum()
    avgTestAcc = correct/400
    testAcc.append(avgTestAcc)
        
    # Plotting Loss vs Epochs
    fig1 = plt.figure(1)        
    plt.plot(range(epoch+1),trainLoss,'r--',label='train')        
    if epoch==0:
        plt.legend(loc='upper left')
        plt.xlabel('Epochs')
        plt.ylabel('Loss')    
    # Plotting testing accuracy vs Epochs
    fig2 = plt.figure(2)        
    plt.plot(range(epoch+1),testAcc,'g-',label='test')        
    if epoch==0:
        plt.legend(loc='upper left')
        plt.xlabel('Epochs')
        plt.ylabel('Testing accuracy')    
    epochEnd = time.time()-epochStart
    print('At Iteration: {:.0f} /{:.0f}  ;  Training Loss: {:.6f} ; Testing Acc: {:.3f} ; Time consumed: {:.0f}m {:.0f}s '\
          .format(epoch + 1,iterations,avgTrainLoss,avgTestAcc*100,epochEnd//60,epochEnd%60))
end = time.time()-start
print('Training completed in {:.0f}m {:.0f}s'.format(end//60,end%60))

The error that gets returned is

RuntimeError: CUDA out of memory. Tried to allocate 88.00 MiB (GPU 0; 4.00 GiB total capacity; 2.27 GiB already allocated; 38.45 MiB free; 2.33 GiB reserved in total by PyTorch)

Thanks a lot for helping in advance. I’m new here so please let me know if there is anything I should do to improve the question

It depends on the number of parameters in your model. The larger the model, the more parameters need to be stored, the more memory needs to be available. 8GB may not be enough for some models.

  1. Can you tell us what model you are using and also what batch size and what image size.
    I have run your code on mobilenet with 128x128 images and batchsize 200 and it needs 5,6GB so 8 should be fine.

  2. According to your error message you have 4GB and not 8GB of VRAM. Which is also what NVIDIAs website says the GTX1650 has.

  3. Could you tell us what pytorch version you are using? I noticed you are using Variable() which has been deprecated since pytorch version 0.4.0

  4. Is the error coming up right at the beginning or sometime later, during training? If it comes up later, it is usually a problem of accumulating tensors which still have there grad attached.
    But since you are using runningLoss += loss.detach() I do not think this is the problem.

  5. A few tips:
    If your loss is a single value you can use loss.item() instead of loss.detach() it will give you a python number instead of a detached tensor, which is a bit more lightweight (it also detaches the grad obviously).
    You can wrap your test code into with torch.no_grad():. This will make sure your tensors do not have a grad attached when testing (grad is not necessary during test since you are not backpropagating). You can do it like so:

with torch.no_grad():
    #code for getting test acc here