Unable to clean CUDA cache with torch.cuda.empty_cache()

Dear all,

I ran into a situation where I need to duplicate a large tensor many times. To get around this, I only create a small number of duplicates each time in a loop. To ensure sufficient memory, torch.cuda.empty_cache() is called right before duplicating the tensor. The code is something like this

a = torch.rand(1, 256, 256, 256).cuda()
for ...
    torch.cuda.empty_cache()
    b = torch.cat([a]*100, 0)  # CUDA out of memory at this line
    # Do some operation with b
    # The resultant tensor from the operation is reasonably small
    ...
    # Clean up
    del b

However, I still ran out of memory because some cache are not cleaned for some reason. The error is shown below.

RuntimeError: CUDA out of memory. Tried to allocate 7.63 GiB 
(GPU 0; 11.92 GiB, total capacity; 679.74 MiB already allocated; 7.51 GiB free; 3.29 GiB cached)

Since the version of my CUDA driver is 9.0, which doesn’t support the latest pytorch build, I can’t really use more sophisticated tools like torch.cuda.memory_stats(). So, I’m wondering what kind of cache can actually be cleaned with torch.cuda.empty_cache()?

I managed to find the culprit of the problem. It’s not actually related to any of the code I posted above, but something upstream. But it does appear that torch.cuda.empty_cache() cannot clean all cached memory. I’ve searched through most of the documentations available, and the best I got is

Calling empty_cache() releases all unused cached memory from PyTorch so that those can be used by other GPU applications

I’m not quite sure what kind of cached memory is used. It’ll be great if someone can clarify