How to calculate the GPU memory that a model uses?

I wanted to reduce the size of Pytorch models since it consumes a lot of GPU memory and I am not gonna train them again.

First, I thought I could change them to TensorRT engine.
and then I was curious how I can calculate the size of gpu memory that it uses.

Pytorch model size can be calculated by

torch.cuda.memory_allocated

or
calculating using model.parameters() and model.buffers()

I checked if the above results had same values and they had.

But the size of TensorRT engine or other Scripted modules for GPU cannot be calculated by the above torch functions.
So I thought I could check the gpu memory usage size with GPUtil library.

However, the memory usage size that was calculated by GPUtil library (using nvidia-smi) was too different.
For example, one model has 13MiB size but almost 2 GiB was allocated in GPU. The other model has 171MiB but also around 2GiB was allocated in GPU. I didn’t put other objects such as inputs in GPU.

and Even after deleting the model,

del model
gpu = GPUtil.getGPUs()[0]
memoryUsed = gpu.memoryUsed

memory was still not 0, while torch.cuda.memory_allocated(0) shows 0.

how do you calculate the GPU memory that a pytorch model uses?
or how do you compare the GPU memory that a pytorch model uses and its script-mode uses?

and if I understood right and used the right functions, why is the actual allocated memory that different from real torch tensor bytes?
I knew it could be different because of using defined page sizes but I didn’t expect that it could be that much different (2GB difference).

1 Like

PyTorch will create the CUDA context in the first CUDA operation, which will load the driver, kernels (native from PyTorch as well as used libraries etc.) and will take some memory overhead depending on the device.
PyTorch doesn’t report this memory which is why torch.cuda.memory_allocated() could return a 0 allocation.
You would thus need to use nvidia-smi (or any other “global” reporting tool) to check the overall GPU memory usage.

3 Likes

Is there a way to measure the peak gpu memory consumption between two points in time? For example, I am running a forward call on some network. Memory start at value x, then during the forward call goes to y, and then at the end back to x- I would like to discover the y value.

torch.cuda.max_memory_allocated(*device=None* ) is all you need!
torch.cuda.max_memory_allocated

Thank you.
Strangely however, this does not yield 0:

torch.cuda.reset_peak_memory_stats(device=None)
print(f"gpu used {torch.cuda.max_memory_allocated(device=None)} memory")

It works for me:

print(f"gpu used {torch.cuda.max_memory_allocated(device=None)} memory")
# gpu used 0 memory

x = torch.randn(1024, device="cuda")
y = torch.randn(1024, device="cuda")
print(f"gpu used {torch.cuda.max_memory_allocated(device=None)} memory")
# gpu used 8192 memory

# should return same memory usage as nothing was deleted
torch.cuda.reset_peak_memory_stats(device=None)
print(f"gpu used {torch.cuda.max_memory_allocated(device=None)} memory")
# gpu used 8192 memory

# delete tensors and reduce peak memory usage
del y
torch.cuda.reset_peak_memory_stats(device=None)
print(f"gpu used {torch.cuda.max_memory_allocated(device=None)} memory")
# gpu used 4096 memory

del x
torch.cuda.reset_peak_memory_stats(device=None)
print(f"gpu used {torch.cuda.max_memory_allocated(device=None)} memory")
# gpu used 0 memory

I’ve put it in a random place in my program, and it didn’t show 0. It should, right? There is occupied memory by the GPU, but if it’s right after the reset_peak, I guess it should show 0, but it doesn’t.

No, it shouldn’t since you are only resetting the peak which makes the currently used memory the new peak.

Well, that’s what is happening to me. strange

In case you misunderstood me: calling reset_peak_memory should not always show a 0 memory usage, since memory could still be allocated making it the new peak as seen in my code snippet.

So how to convert that to bytes or MB etc ?

The returned values are given as bytes as described in the docs. You can simply convert them to MB by diving them by 1024**2.

1 Like

You need to know the size of each parameter (e.g., 4 bytes for float32) and estimate activation and optimizer states based on your model’s architecture. Liteblue Tools like TensorBoard or built-in profiling tools in frameworks like PyTorch or TensorFlow can also help.