Now I am using batch size 128 for both training and validation but the gpu ram (2080Ti 11G) is full.
By the way, my task is to combine image model and language model to classify. I am not sure whether my model is too large or not.
There are 443,757 question for training 214,354 for validation. I think the batch size 128 is a little bit small. The training time is almost 2.5 hours per epoch. It really drives me crazy…
You can reduce the memory usage during validation by wrapping the validation loop in a with torch.no_grad() block, which will make sure to not store the intermediate activations, which would be needed to calculate the gradients. If you aren’t using it already, you might be able to increase the batch size during validation further and speed up this loop.