Usage of PYTORCH_NO_CUDA_MEMORY_CACHING flag in prod and stream awareness

abhishek179 · December 13, 2022, 4:45am

Hi pytorch Team,
reviewing the CUDACachingAllocator.cpp i see it provides a recordStream() functionality to help insert the correct synchronization when allocations are used on multiple streams. This will ensure that the block is not reused before each recorded stream completes work.

We have a scenario where we want to run with “PYTORCH_NO_CUDA_MEMORY_CACHING” set to true in Production workload. I want to understand what will happen to record_stream functionality if we enable “PYTORCH_NO_CUDA_MEMORY_CACHING” by setting it to true. Does pytorch handle the stream aware allocation in different way when “PYTORCH_NO_CUDA_MEMORY_CACHING” is enabled?