Cuda out of memory despite consuming only 60% memory

I’m getting cuda out of memory error. The error is shown below

RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 23.65 GiB total capacity; 21.65 GiB already allocated; 242.88 MiB free; 22.55 GiB reserved in total by PyTorch)

I’m able to train the model after I reduce the batch_size. When I checked the output of nvidia-smi , I see a 40% of memory is still free. Here is the output.

What could be possible reason for this?
pytorch: 1.4
cuda: 10.2
input_size: (512, 512, 4)
using half-precision

More information: The plot of gpu-utilization is shown below

The numbers at each peak represent the batch_size. It seems the initial memory requirement is much higher than memory needed afterward. Can someone explain?

The higher peak memory usage might e.g. indicate different behaviors.
E.g. some initial copies are needed, if you are passing expanded tensors, which are needed in a contiguous format.
Also, if you are running your script with cudnn.benchmark the initial iteration will benchmark different cudnn algorithms and select the fastest one, which might also create a memory peak (however, this should now yield an OOM issue).

I’ve checked the code cudnn.benchmark is True. Though It is very unlikely that pytorch adds more than 10 gb for 8 more images.
I’m unable to understand the meaning of “some initial copies are needed if you are passing expanded tensors, which are needed in a contiguous format”. Can you please explain this? It might help me in locating the problem.

Here is an example of what I had in mind.

Haven’t used contiguous anywhere in the code. I’m using catalyst DL and segementation_models_pytorch. Catalyst DL uses contiguous call only in lovasz loss and I’m not using it.

Just checked cudnn.benchmark makes no difference.

Try to use : torch.no_grad() & model.eval() if not using.
Most important : torch.no_grad() as it impacts the autograd engine and deactivate it. It will reduce memory usage and speed up computations in validation loop or test loop.