CUDA profiling with autograd.profiler.profile

mzhang · February 5, 2018, 11:16pm

What’s the recommended method for GPU profiling? I installed the latest version of pytorch with conda, torch.__version__ reports 0.3.0.post4, but when I try to call torch.autograd.profiler.profile(use_cuda=True) I get the error __init__() got an unexpected keyword argument 'use_cuda'. Is this feature only available in the version from the github repo?

swibe · February 6, 2018, 11:12am

The use_cuda parameter is only available in versions newer than 0.3.0, yes. Even then it adds some overhead. The recommended approach appears to be the emit_nvtx function:

with torch.cuda.profiler.profile():
    model(x) # Warmup CUDA memory allocator and profiler
    with torch.autograd.profiler.emit_nvtx():
        model(x)

fbcotter · February 28, 2018, 12:08am

Trying to run that code gives me an error about the use_cuda flag (with version 0.3.1). For example:

import torch
from torch.autograd import Variable
x = Variable(torch.randn(5,5), requires_grad=True).cuda()
with torch.autograd.profiler.profile() as prof:
    y = x**2
    with torch.autograd.profiler.emit_nvtx():
        y = x**2
print(prof)

Gives:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-18-c54fc33dff6e> in <module>()
      1 with torch.autograd.profiler.profile() as prof:
      2     y = x**2
----> 3     with torch.autograd.profiler.emit_nvtx():
      4         y = x**2
      5

~/.pyenv/versions/3.6.1/envs/phdnets2/lib/python3.6/site-packages/torch/autograd/profiler.py in __enter__(self)
    213         self.entered = True
    214         torch.cuda.synchronize()
--> 215         torch.autograd._enable_profiler(True)
    216         return self
    217

sakaia · July 23, 2018, 8:55am

I try to run the script on 0.4.0 and it works fine with torch.autograd.profiler.profile(use_cuda=True).

It seems this problem should be solved on upgrading to 0.4.0.

import torch
cuda = torch.device('cuda')
x = torch.randn((1, 1), requires_grad=True)
print(x.device)
with torch.autograd.profiler.profile(use_cuda=True) as prof:
        y = x ** 2
        y.backward()
print(prof)