Why torch.cuda.allocated_memory reports that GPU Memory usage was decreasing during forwarding?

raining_day513 · April 27, 2023, 4:10pm

[…] the caching allocator can only reserve as much memory as the GPU has, and all of this memory would be available to model training (minus some hopefully small fraction due to fragmentation and GPU kernels).
scroll to the reply

@eqy So am I correct to say that there are some algorithms that PyTorch has implemented for the caching allocator to predict how much GPU will be reserved before the training? Since you only mentioned that it will “at most” use all the memory of the current single GPU. (Am I correct to say that this is the reason for doing the warm-up round in the first few batches?)

If you know the maximum amount of memory PyTorch should use and expect to use memory for other process […] you can limit the amount of memory the caching allocator can reserve with e.g., torch.cuda.set_per_process_memory_fraction […]

Yes, I’m working on an algorithm to make the prediction. So I will assume that this API will be useful only if the default algorithm of caching allocator might not set the optimal fraction(a memory top-bound that is very close to what will actually be used), right? I think I need to know where are these files located in the source, could you help? (I selected your reply above due to the title. So, this thread basically ends. But I’m more interested in our further discuss here.)

Again, it’s amazing that you replied all these with valuable details and references, very thank you!