Exiting Training function, does not free memory for validation, which runs out of memory

Python object defined in local scope is freed outing the scope (no pointer to it). Check if your GPU tensors are saved to some objects in a global scope while/after training. Note that even if you remove Torch GPU tensors, the memory is not released to OS but kept in a pool, for a faster realloc of the future tensors (About torch.cuda.empty_cache() - #2 by albanD). So even if your nvidia-smi memory usage looks full, but is still available.