About torch.cuda.empty_cache()

Hi,

No it most like won’t.
The trick used to get more contiguous block that we use is to free the memory back to the GPU driver and allocate it back. But that is already done when you’re about to run out of memory.
And doing this repeatidely will slow down the process for not much gain.

But this is quite a bad case of fragmentation. Do you have a small code sample that reproduces it? We might be able to improve the allocator for this bad case.