GPU memory not fully released after training loop

My nvidia-smi goes to the max memory on the card and the loop will crash with:

THCudaCheck FAIL file=/b/wheel/pytorch-src/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory