I am writing this message for you as you always have helped me with very good answers.
I am doing Kaggle competitions but I always run on the problem that I can run bigger batch size and get really bad results.
I have two 2080 ti with 11 Gig of memory and trying to run images 300x300 with batch size 8 give me very bad results and with 16 it always tells me that CUDA ran out of memory…
@albanD, I tried that, the biggest batch size is 8… for the code you posted it is done using torch.utils.checkpoint to trade compute for memory, but I dont know how to implement it…
batch_size_you_want = 64
max_batch_size = 8 # create dataloader with this
processed_samples = 0
for batch in dataloader:
loss = get_loss(batch)
loss.backward()
processed_samples += max_batch_size
if processed_samples >= batch_size_you_want:
opt.step()
opt.zero_grad()
processed_samples = 0
This code will do training as if you were using a batch size of 64. Without ever using more memory than if you were using a batch size of 8.
checkpointing is a bit different. You will need to modify you model’s forward pass to wrap groups of operations in a torch.utils.checkpoint. This will allow you to free the memory used by the intermediary buffers that are between the ops that you gave to the checkpoint.