Python object defined in local scope is freed outing the scope (no pointer to it). Check if your GPU tensors are saved to some objects in a global scope while/after training. Note that even if you remove Torch GPU tensors, the memory is not released to OS but kept in a pool, for a faster realloc of the future tensors (About torch.cuda.empty_cache() - #2 by albanD). So even if your nvidia-smi memory usage looks full, but is still available.