Dear all,
I ran into a situation where I need to duplicate a large tensor many times. To get around this, I only create a small number of duplicates each time in a loop. To ensure sufficient memory, torch.cuda.empty_cache()
is called right before duplicating the tensor. The code is something like this
a = torch.rand(1, 256, 256, 256).cuda()
for ...
torch.cuda.empty_cache()
b = torch.cat([a]*100, 0) # CUDA out of memory at this line
# Do some operation with b
# The resultant tensor from the operation is reasonably small
...
# Clean up
del b
However, I still ran out of memory because some cache are not cleaned for some reason. The error is shown below.
RuntimeError: CUDA out of memory. Tried to allocate 7.63 GiB
(GPU 0; 11.92 GiB, total capacity; 679.74 MiB already allocated; 7.51 GiB free; 3.29 GiB cached)
Since the version of my CUDA driver is 9.0, which doesn’t support the latest pytorch build, I can’t really use more sophisticated tools like torch.cuda.memory_stats()
. So, I’m wondering what kind of cache can actually be cleaned with torch.cuda.empty_cache()
?