Caching Memory Allocator


I found the document that PyTorch utilizes the caching memory allocator for fast memory deallocation without device synchronization.

Is it a general concept or PyTorch specified memory management technique?
Are there any detailed documents or information about caching memory allocator?
I want to understand how caching memory allocator works.


It’s a common pattern used on top of CUDA in different frameworks to avoid the expensive direct allocations. You could take a code into the implementation for detailed information.

