`torch.cuda.memory._record_memory_history()` does not work with env var PYTORCH_CUDA_ALLOC_CONF=backend:cudaMallocAsync

i was trying to _record_memory_history() api to get a snapshot of the memory allocations and profile my code, upon observation i saw a lot of memory fragmentations.

upon referring to referring to CUDA semantics — PyTorch 2.9 documentation , i saw that you can set PYTORCH_CUDA_ALLOC_CONF=backend:cudaMallocAsync to call cuda’s async memory allocation api which should ideally avoid fragmentation. but i wasn’t able to execute record_memory_snapshot and ran into this error, and says that it is not supported yet.

why is it the case ?

or are there any workarounds for this ?

also how can i avoid fragmentation, or any tips in general to be aware of ?

PS : I’m very new to profiling torch, so any help would be greatly appreciated.

Thanks!