How can we simple calculate the GPU memory model (nn.Module) use? Just a single GPU unit.
To calculate the memory requirement for all parameters and buffers, you could simply sum the number of these and multiply by the element size:
mem_params = sum([param.nelement()*param.element_size() for param in model.parameters()])
mem_bufs = sum([buf.nelement()*buf.element_size() for buf in model.buffers()])
mem = mem_params + mem_bufs # in bytes
However, this will not include the peak memory usage for the forward and backward pass (if that’s what you are looking for).
What’s the peak memory usage?
During training you are using intermediate tensors needed to backpropagate and calculate the gradients. These intermediate tensors will be freed once the gradients were calculated (and you haven’t used retain_graph=True
), so you’ll see more memory usage during training than the initial model parameters and buffers would use.
Is peak memory usage equivalent to forward/backward pass size here?
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 10, 24, 24] 260
Conv2d-2 [-1, 20, 8, 8] 5,020
Dropout2d-3 [-1, 20, 8, 8] 0
Linear-4 [-1, 50] 16,050
Linear-5 [-1, 10] 510
================================================================
Total params: 21,840
Trainable params: 21,840
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.06
Params size (MB): 0.08
Estimated Total Size (MB): 0.15
----------------------------------------------------------------
It might be, but I’m not sure which utility you are using and how it estimates the memory usage.
@ptrblck, How can we measure the peak memory usage? This sounds like the most important question not to break into CUDA out of memory errors. Should I ask the separate question for this?
torch.Cuda.max_memory_allocated()
should give you the max value. I’m not sure, if your currently used logging library gives a matching number, but it would be interesting to see.
Thanks, it was:
import torch
torch.cuda.max_memory_allocated()
This can help me figure out the max batch size I can use on a model, hopefully. But I wonder if something similar is present in PyTorch already.
However, I am not sure if this thing will also count the memory in the garbage collector that can be free after gc.collect()
.
Maybe this is called cache.
These intermediate tensors will be freed once the gradients were calculated (and you haven’t used
retain_graph=True
)
Could you provide the link to the exact lines in the source? I need to investigate this part. Thank you a lot for helping people here!