I’d like to understand in depth what happens exactly when you create a new Tensor. I know memory in PyTorch/libtorch is cached, hence an already allocated memory block is queried before possibly allocating new memory. But how many memory blocks does the cache hold? What is their size? Is there any event which would cause deallocation or the cache grows indefinitely until empty_cache() is explicitly called?
I am answering from the perspective of CUDA tensors:
It depends on how much memory is available to PyTorch and what your usage pattern is like. You can limit the available memory using per_process_memory_fraction for example. PyTorch does not pre-allocate memory. Allocations happen when you create a tensor. The number of blocks that PyTorch holds will depend on what your allocations have looked like up until that point in the application.
There are two types of blocks held by CUDACachingAllocator. Small blocks and Large blocks. The small blocks are 2MB in size and large blocks are 20MB each but that can be configured using large_segment_size_mb.