Profiling with DataParallel to compute memory usage

Hi,

I am trying to do a forward pass with a model wrapped in torch.nn.DataParallel and compute the GPU memory usage using the torch.profiler.profiler.profile method (code snippet provided below). It seems to work fine when I set the number of GPUs to 1 (i.e., the ‘profile’ method output seems to match the same for the non-DataParallel single-gpu version of the model). However, the output doesn’t look credible when I increase the number of GPUs to more than 1. Is it even possible to get the memory usage during a multi-gpu forward pass?

Code snippet:

with profiler.profile(activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA],
                      profile_memory=True,
                      record_shapes=True) as prof:
    out = model(input)