GPU memory that model uses

blackbirdbarber · September 25, 2019, 6:33pm

How can we simple calculate the GPU memory model (nn.Module) use? Just a single GPU unit.

ptrblck · September 25, 2019, 11:04pm

To calculate the memory requirement for all parameters and buffers, you could simply sum the number of these and multiply by the element size:

mem_params = sum([param.nelement()*param.element_size() for param in model.parameters()])
mem_bufs = sum([buf.nelement()*buf.element_size() for buf in model.buffers()])
mem = mem_params + mem_bufs # in bytes

However, this will not include the peak memory usage for the forward and backward pass (if that’s what you are looking for).

blackbirdbarber · September 26, 2019, 7:27am

What’s the peak memory usage?

ptrblck · September 26, 2019, 11:19am

During training you are using intermediate tensors needed to backpropagate and calculate the gradients. These intermediate tensors will be freed once the gradients were calculated (and you haven’t used retain_graph=True), so you’ll see more memory usage during training than the initial model parameters and buffers would use.

vainaijr · September 26, 2019, 11:22am

Is peak memory usage equivalent to forward/backward pass size here?

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1           [-1, 10, 24, 24]             260
            Conv2d-2             [-1, 20, 8, 8]           5,020
         Dropout2d-3             [-1, 20, 8, 8]               0
            Linear-4                   [-1, 50]          16,050
            Linear-5                   [-1, 10]             510
================================================================
Total params: 21,840
Trainable params: 21,840
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.06
Params size (MB): 0.08
Estimated Total Size (MB): 0.15
----------------------------------------------------------------

ptrblck · September 26, 2019, 11:38am

It might be, but I’m not sure which utility you are using and how it estimates the memory usage.

blackbirdbarber · September 27, 2019, 7:34am

@ptrblck, How can we measure the peak memory usage? This sounds like the most important question not to break into CUDA out of memory errors. Should I ask the separate question for this?

ptrblck · September 27, 2019, 2:51pm

torch.Cuda.max_memory_allocated() should give you the max value. I’m not sure, if your currently used logging library gives a matching number, but it would be interesting to see.

blackbirdbarber · September 27, 2019, 5:58pm

Thanks, it was:

import torch
torch.cuda.max_memory_allocated()

This can help me figure out the max batch size I can use on a model, hopefully. But I wonder if something similar is present in PyTorch already.

However, I am not sure if this thing will also count the memory in the garbage collector that can be free after gc.collect().

Maybe this is called cache.

raining_day513 · April 17, 2023, 4:00am

These intermediate tensors will be freed once the gradients were calculated (and you haven’t used retain_graph=True )

Could you provide the link to the exact lines in the source? I need to investigate this part. Thank you a lot for helping people here!