Error when profiling with utils.bottleneck

I was trying to profile my code with instructions from

The code runs without errors in general with cuda on a single GPU.

However, whenever I try to profile with the command
python -m torch.utils.bottleneck /path/to/source/ [args]

I get the following error

Traceback (most recent call last):
  File "/home/kowshik/anaconda3/lib/python3.6/", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/kowshik/anaconda3/lib/python3.6/", line 85, in _run_code
    exec(code, run_globals)
  File "/home/kowshik/anaconda3/lib/python3.6/site-packages/torch/utils/bottleneck/", line 234, in <module>
  File "/home/kowshik/anaconda3/lib/python3.6/site-packages/torch/utils/bottleneck/", line 213, in main
    autograd_prof_cpu, autograd_prof_cuda = run_autograd_prof(code, globs)
  File "/home/kowshik/anaconda3/lib/python3.6/site-packages/torch/utils/bottleneck/", line 107, in run_autograd_prof
  File "/home/kowshik/anaconda3/lib/python3.6/site-packages/torch/utils/bottleneck/", line 100, in run_prof
    with profiler.profile(use_cuda=use_cuda) as prof:
  File "/home/kowshik/anaconda3/lib/python3.6/site-packages/torch/autograd/", line 180, in __enter__
**RuntimeError: /opt/conda/conda-bld/pytorch_1544174967633/work/torch/csrc/autograd/profiler.h:72: all CUDA-capable devices are busy or unavailable**

How do I avoid this

Can someone help me with this please

Could this be the issue?

Thanks @ptrblck.

I have only one GPU. The other GPU is for display purposes and pytorch doesnt support it

Thanks for the information!
Let’s try to narrow down the source of the problem.

Are other processes working on the GPUs?
Are you able to create a tensor on all devices?
If so, is nn.DataParallel running successfully?

Thanks Peter. I have updated my response with a new image. I have only one GPU

That’s interesting. Maybe PyTorch tried to create the CUDA context on GPU0, which might fail.
Could you try to run your script from the terminal using:

CUDA_VISIBLE_DEVICES=1 python -m torch.utils.bottleneck args

CUDA_VISIBLE_DEVICES=1, doesnt work for me. Instead I have to use CUDA_VISIBLE_DEVICES=0 for it to run on GPU.
However, now when I run with CUDA_VISIBLE_DEVICES=1 python -m torch.utils.bottleneck args, i get the following error
RuntimeError: /opt/conda/conda-bld/pytorch_1544174967633/work/torch/csrc/autograd/profiler.h:72: out of memory

Yeah, the order of your GPUs might be different than shown in nvidia-smi.
OK, so at least we got now an OOM error.
Could you create a dummy script with a low memory usage and try to run bottleneck with it just to see if the first issue is solved?