A PTX JIT compilation failed

Hi Everyone, I got this error on remote computer, can you please help me solve it,

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1523240155148/work/torch/lib/THC/generic/THCStorage.cu line=58 error=78 : a PTX JIT compilation failed
Traceback (most recent call last):
  File "/home/youcef/.vscode-server/extensions/ms-python.python-2019.10.41019/pythonFiles/ptvsd_launcher.py", line 43, in <module>
    main(ptvsdArgs)
  File "/home/youcef/.vscode-server/extensions/ms-python.python-2019.10.41019/pythonFiles/lib/python/old_ptvsd/ptvsd/__main__.py", line 432, in main
    run()
  File "/home/youcef/.vscode-server/extensions/ms-python.python-2019.10.41019/pythonFiles/lib/python/old_ptvsd/ptvsd/__main__.py", line 316, in run_file
    runpy.run_path(target, run_name='__main__')
  File "/home/youcef/don/yes/envs/HYmenv2/lib/python2.7/runpy.py", line 252, in run_path
    return _run_module_code(code, init_globals, run_name, path_name)
  File "/home/youcef/don/yes/envs/HYmenv2/lib/python2.7/runpy.py", line 82, in _run_module_code
    mod_name, mod_fname, mod_loader, pkg_name)
  File "/home/youcef/don/yes/envs/HYmenv2/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/youcef/hyps/HYMC/HSIGCN_11102019 (ssh)/main.py", line 88, in <module>
    model = model.cuda()
  File "/home/youcef/don/yes/envs/HYmenv2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 216, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/home/youcef/don/yes/envs/HYmenv2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 146, in _apply
    module._apply(fn)
  File "/home/youcef/don/yes/envs/HYmenv2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 146, in _apply
    module._apply(fn)
  File "/home/youcef/don/yes/envs/HYmenv2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 152, in _apply
    param.data = fn(param.data)
  File "/home/youcef/don/yes/envs/HYmenv2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 216, in <lambda>
    return self._apply(lambda t: t.cuda(device))
  File "/home/youcef/don/yes/envs/HYmenv2/lib/python2.7/site-packages/torch/_utils.py", line 69, in _cuda
    return new_type(self.size()).copy_(self, async)
  File "/home/youcef/don/yes/envs/HYmenv2/lib/python2.7/site-packages/torch/cuda/__init__.py", line 387, in _lazy_new
    return super(_CudaBase, cls).__new__(cls, *args, **kwargs)
RuntimeError: cuda runtime error (78) : a PTX JIT compilation failed at /opt/conda/conda-bld/pytorch_1523240155148/work/torch/lib/THC/generic/THCStorage.cu:58

I run the same program on desktop , it ran without any errors. I checked before posting the available anzswers on similar topic , none of them worked.
Thanks in advance

So the PTX JIT compilation only kicks in when you have a CUDA compute arch for your hardware that isn’t supported by the binary you are running and the PTX JIT jumps in to bridge that gap.
So the questions would be

  • What is the compute arch of your hardware? (If you don’t know, Wikipedia has the information for “sales name -> arch”.)
  • What is are the compute archs included in your PyTorch?
  • Is something up with the CUDA installation that makes it fail?

Typically, official PyTorch binaries come with the supported arch binaries all compiled while by default self-compile PyTorch only comes with the arch of the hardware you compile on.
You can use cuobjdump /usr/local/lib/python3.7/dist-packages/torch/lib/libtorch.so | grep 'arch' | sort | uniq to check what PyTorch has.
For example, on my GTX1080Ti, a self-compiled PyTorch will have only arch = 6.1 while the 1.2 wheel from the PyTorch site has 30,35,50,60,61,70,75.

The easiest remedy is likely to deploy a PyTorch that includes the right arch binaries.

Best regards

Thomas

1 Like

Thanks your explanation @tom , it is Solved !

Hi, I have a similar error. Could you tell me your solution?

2 Likes

I was just missing some files on the remote computer, that was the problem.