I have a deep network for person re-identification. My batch size is 256, when I run my code, I gives an error due to lack of gpu memory. However, I don’t want to reduce my batch size. Is there any way that I break my batch into several parts and after processing each batch I update the weights?
Yes, you could delay the
optimizer.step() using different approaches as described in this post.
Note that these approaches might yield the desired gradients, but e.g. the running stats in batch norm layers might suffer from the smaller batch sizes, so you might need to adjust the
Alternatively, you could use
torch.utils.checkpointing to trade compute for memory and keep the high batch sizes.