Issues with running model on GPU

I am training a ViT-B/16 on a dataset with 4K samples using 24 NVIDIA Quadro RTX 6000 GPUs.
The training was going well until I add a line to make an inference on the same model twice using different inputs at the same iteration. This resulted in the error message “RuntimeError: CUDA out of memory. Tried to allocate 148.00 MiB” appearing every time. I tried to release the GPU cache memory, but that did not help. Can someone help me with this issue?

If you are not disabling the gradient calculation each forward call will store intermediate results which are needed for the gradient calculation during the forward pass. Either call backward() between these forward passes or disable gradient calculation by wrapping the forward calls into a torch.no_grad() context.

1 Like