Hi, I’m trying to temporarily disable the pre-allocation of CUDA GPU memory to investigate the CUDA GPU memory usage of some experiments with my model. To be clear,
- I want to make sure that GPU is allocated as late as possible, so I can read the correct and exact memory usage of each layer of my model. (I understood that
cudaMalloc is an expensive operation, but that’s fine in my case!)
- I want to release the GPU memory as soon as possible. From my current understanding, this has been done decently in the current code base. I just simply mentioned it here.
From my current understanding, I will need to investigate
cudaMalloc, and the pre-allocation is controlled by the cache memory allocator as mentioned by my title.
Am I on the right track this way trying to understand the source of
CUDAMallocAsyncAllocator? It will help me a lot if someone did this before and could share it with me. Or you might point out what’s the problem with this track and point me to another recommended one. Thanks!
You can disable the caching allocator via
export PYTORCH_NO_CUDA_MEMORY_CACHING=1 which will then use
cudaMalloc/Free calls without reusing any memory.
Hi, ptrblck, thanks for your advice! I will try it for sure!
Before I found your answer I was thinking about using both:
register_forward_hook of nested
nn.Module to measure the CUDA GPU memory of each layer of my model as if the CUDA caching mechanism of PyTorch is not enabled. (In detail: I was thinking about subtracting the value returned by the first API from the GPU memory usage including caches.) Do you think this a good idea? Thanks for your kindness (and I probably used the wrong tag, let me fix it now!)
Instead of checking the
reserved memory (which includes the cache) and subtracting the reported memory by
nvidia-smi I would just use
torch.cuda.memory_allocated which will return the used and allocated memory only.
I agree, memory_allocated will be a better idea. I think raining_day513 is in the right track.