I increase the batch size but the Memory-Usage of GPU decrease




As you can see when the batch size is 40 the Memory-Usage of GPU is about 9.0GB, when I increase the batch size to 50, the Memory-Usage of GPU decrease to 7.7GB. And I continued to increase the batch size to 60, and it increase to 9.2GB. Why the Memory-Usage of GPU was so high.According to the common sense, it should be lower than 7.7GB.

The displayed memory usage should show the CUDA context + the actual memory used to store tensors + cached memory + other applications.
Try to check the memory using torch.cuda.memory_allocated() and torch.cuda.memory_cached().

I add the flowing sentence in my code

                        if (iteration+1)%10 == 0:
                stop = time.time()
                print("epoch: [%d/%d]"%(epoch,EPOCHS), "iteration: [%d/%d]"%(iteration + 1,len(train_dataset)//BATCH_SIZE), "loss:%.4f" % loss.item(),
                      'time:%.4f' % (stop - start))

                print("torch.cuda.memory_allocated: %fGB"%(torch.cuda.memory_allocated()/1024/1024/1024))
                print("torch.cuda.memory_cached: %fGB"%(torch.cuda.memory_cached()/1024/1024/1024))
                start = time.time()

And the output in the terminal as follow:

(pt1.2) D:\code\rnn>python train_new.py -m test -b 40
epoch: [0/3000] iteration: [10/92] loss:0.1948 time:28.8928
torch.cuda.memory_allocated: 0.141236GB
torch.cuda.memory_cached: 8.539062GB
epoch: [0/3000] iteration: [20/92] loss:0.0986 time:6.5122
torch.cuda.memory_allocated: 0.141236GB
torch.cuda.memory_cached: 8.539062GB

(pt1.2) D:\code\rnn>python train_new.py -m test -b 50
epoch: [0/3000] iteration: [10/73] loss:0.1436 time:29.8940
torch.cuda.memory_allocated: 0.144663GB
torch.cuda.memory_cached: 7.197266GB
epoch: [0/3000] iteration: [20/73] loss:0.0644 time:7.6573
torch.cuda.memory_allocated: 0.144663GB
torch.cuda.memory_cached: 7.197266GB

(pt1.2) D:\code\rnn>python train_new.py -m test -b 60
epoch: [0/3000] iteration: [10/61] loss:0.1918 time:31.1637
torch.cuda.memory_allocated: 0.151530GB
torch.cuda.memory_cached: 8.666016GB
epoch: [0/3000] iteration: [20/61] loss:0.0936 time:8.8493
torch.cuda.memory_allocated: 0.151408GB
torch.cuda.memory_cached: 8.666016GB

Since the allocated memory increases with a higher batch size, it looks fine. :slight_smile:

yes, but I wonder why the memory cached of b40 will be larger than b50. It will be difficult for me to set the proper batch size.

The cache size might vary e.g. due to cudnn benchmarking and shouldn’t yield an out of memory error, if I’m not mistaken. Are you running out of memory using a smaller batch size?