Caching Memory Allocator

Hello,

I found the document that PyTorch utilizes the caching memory allocator for fast memory deallocation without device synchronization.
https://pytorch.org/docs/stable/notes/cuda.html

Is it a general concept or PyTorch specified memory management technique?
Are there any detailed documents or information about caching memory allocator?
I want to understand how caching memory allocator works.

Thanks

It’s a common pattern used on top of CUDA in different frameworks to avoid the expensive direct allocations. You could take a code into the implementation for detailed information.

Thanks!!
That was fast :slight_smile: