How is GPU memory allocated and released? Procedure to release GPU memory?

Hi,

The idea is that we have a caching allocator.
So when you need to put a Tensor on the GPU, we ask the caching allocator. If it already have enough space, it just returns it. Otherwise, it asks the GPU driver for new memory.
When the Tensor is destroyed, the memory is kept around by the allocator.
The memory is only released to the GPU driver if you are about to OOM or if the user cal empty_cache().

Does that answer your question?