Exiting Training function, does not free memory for validation, which runs out of memory

I have a code that looks like this:

    for epoch in range(epochs):
        aloss_t,aloss_E_t,aloss_F_t,MAE_t,Fps_t,Fts_t, t_dataload_t, t_prepare_t, t_model_t, t_backprop_t = use_model(model, dataloader_train, train=True, max_samples=1e6, optimizer=optimizer, device=device, batch_size=batch_size)
        aloss_v,aloss_E_v,aloss_F_v,MAE_v,Fps_v,Fts_v,t_dataload_v, t_prepare_v, t_model_v, t_backprop_v = use_model(model, dataloader_val, train=False, max_samples=100, optimizer=optimizer, device=device, batch_size=batch_size)

where use_model is a function that can either be used for training or validation over a dataset. The model runs fine during training and uses about 70% of my gpu memory with batchsize=40, however when it exists the training loop and switches to validation the memory does not get freed, and hence it runs out of memory when running the validation.

I thought the memory would be automatically freed when I left the training function, but apparently not, how do I ensure that this happens?

Python object defined in local scope is freed outing the scope (no pointer to it). Check if your GPU tensors are saved to some objects in a global scope while/after training. Note that even if you remove Torch GPU tensors, the memory is not released to OS but kept in a pool, for a faster realloc of the future tensors (About torch.cuda.empty_cache() - #2 by albanD). So even if your nvidia-smi memory usage looks full, but is still available.

Based on your post I went back and took another look at it, and I narrowed down my issue. I was wrong in the memory not being freed it is instead the eval mode of my code that is requiring more memory than my training mode.

The problem is that in eval mode I’m still running the code with autograd, which is needed since I explicitly need to get gradient information even in eval mode. However I guess this gradient information is accumulating, so I need a way to zero it after each iteration.
Using zero_grad on the optimizer doesn’t solve the problem, guess I need to use it on each variable individually