In my case it was a very average case of images trained with resnet and then unable to run the predictions despite all that available memory. So I guess the only way to move forward (other than trying to use less memory during training) is to save the model, reset everything else that holds data on cuda and then run the predictions.
That’s weird. Since it’s only for predictions, are they run in a with torch.no_grad():
block to hold no temporary buffers?
at the Python level, yes. Using the garbage collector’s inspector.
See:
How to debug causes of GPU memory leaks? - #3 by smth for one code snippet to do this.