This usual indicates that you are holding some tensors in a list (or something similar) which prevents Pytorch from freeing the memory. In the worst case you have a list of non-detached tensors which are still tracked by autograd and hold the whole gradient-path.
If you can post your code we could have a look at it.