How does "reserved in total by PyTorch" work?

@LuckyIceland @Umais @abhinavdhere see this issue which gives more details

It seems that “reserved in total” is memory “already allocated” to tensors + memory cached by PyTorch. When a new block of memory is requested by PyTorch, it will check if there is sufficient memory left in the pool of memory which is not currently utilized by PyTorch (i.e. total gpu memory - “reserved in total”). If it doesn’t have enough memory the allocator will try to clear the cache and return it to the GPU which will lead to a reduction in “reserved in total”, however it will only be able to clear blocks on memory in the cache of which no part is currently allocated. If any of the block is allocated to a tensor it won’t be able to return it to GPU. Thus you can have scenarios where the tensor allocated memory + the gpu free memory is much less than the gpu total memory, because the cache is holding some which it cannot release

5 Likes