I have run into a related issue while using the experimental Windows version. in my train phase, CUDA allocates about 4GBs for mini-batches and I optimize my params. Then when I am done and want to predict on a separate dataset, using the same mini-batch size, a fresh new 4GBs are allocated.
To be more precise, when i am done training, and nothing but the model should remain on the GPU, I can breakpoint and issue these commands: (all memory readings come from nvidia-smi):
T = torch.rand(1000,1000000).cuda() // Now memory reads 8GB (i.e. a further 4 GB was allocated, so the training 4GB was NOT considered ‘free’ by the cache-allocator, even though it was being reused during training)
del T // Still 8 GB (as expected)
T = torch.rand(1000,1000000).cuda() // Still 8GB as expected, the cache-allocator is reusing the same space as the first T above
So it looks like the 4GB from training are still taking up space on the GPU, even though they should be freed. But later they are being reused (when retraining the same model). I.e. they can be reused for the same purpose but not for arbitrary tensors - which makes no sense to me, of course.
Is there a way to manually force the caching allocator to free some GPU memory space? Or, since it seems that the cache-allocator doesn’t think the space is actually free - Can I pull my
model.to_cpu() and then ask torch to free everything it has on the GPU?