Hello,
I just trained a model with 3 A100 cards using PyTorch 2.0.1. The training part works OK, and the inference with 3 GPU cards works normally. However, when I tried to inference the model using PyTorch 2.0 with 1 card only , a bug just occurred:
In file included from /tmp/tmpfjotpdpx/main.c:2:
/beegfs/userhome/gabrielpan/.conda/envs/torch2/lib/python3.8/site-packages/triton/third_party/cuda/include/cuda.h:55:10: fatal error: stdlib.h: No such file or directory
55 | #include <stdlib.h>
| ^~~~~~~~~~
compilation terminated.
In file included from /tmp/tmpy4fwn19e/main.c:2:
/beegfs/userhome/gabrielpan/.conda/envs/torch2/lib/python3.8/site-packages/triton/third_party/cuda/include/cuda.h:55:10: fatal error: stdlib.h: No such file or directory
55 | #include <stdlib.h>
| ^~~~~~~~~~
compilation terminated.
compilation terminated.
In file included from /tmp/tmptis__htf/main.c:2:
/beegfs/userhome/gabrielpan/.conda/envs/torch2/lib/python3.8/site-packages/triton/third_party/cuda/include/cuda.h:55:10: fatal error: stdlib.h: No such file or directory
55 | #include <stdlib.h>
| ^~~~~~~~~~
compilation terminated.
In file included from /tmp/tmpajqnwuaa/main.c:2:
/beegfs/userhome/gabrielpan/.conda/envs/torch2/lib/python3.8/site-packages/triton/third_party/cuda/include/cuda.h:55:10: fatal error: stdlib.h: No such file or directory
55 | #include <stdlib.h>
| ^~~~~~~~~~
compilation terminated.
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/beegfs/userhome/gabrielpan/.conda/envs/torch2/lib/python3.8/concurrent/futures/process.py", line 239, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/beegfs/userhome/gabrielpan/.conda/envs/torch2/lib/python3.8/site-packages/torch/_inductor/codecache.py", line 549, in _worker_compile
kernel.precompile(warm_cache_only_with_cc=cc)
File "/beegfs/userhome/gabrielpan/.conda/envs/torch2/lib/python3.8/site-packages/torch/_inductor/triton_ops/autotune.py", line 69, in precompile
self.launchers = [
File "/beegfs/userhome/gabrielpan/.conda/envs/torch2/lib/python3.8/site-packages/torch/_inductor/triton_ops/autotune.py", line 70, in <listcomp>
self._precompile_config(c, warm_cache_only_with_cc)
File "/beegfs/userhome/gabrielpan/.conda/envs/torch2/lib/python3.8/site-packages/torch/_inductor/triton_ops/autotune.py", line 83, in _precompile_config
triton.compile(
File "/beegfs/userhome/gabrielpan/.conda/envs/torch2/lib/python3.8/site-packages/triton/compiler.py", line 1587, in compile
so_path = make_stub(name, signature, constants)
File "/beegfs/userhome/gabrielpan/.conda/envs/torch2/lib/python3.8/site-packages/triton/compiler.py", line 1476, in make_stub
so = _build(name, src_path, tmpdir)
File "/beegfs/userhome/gabrielpan/.conda/envs/torch2/lib/python3.8/site-packages/triton/compiler.py", line 1391, in _build
ret = subprocess.check_call(cc_cmd)
File "/beegfs/userhome/gabrielpan/.conda/envs/torch2/lib/python3.8/subprocess.py", line 364, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/opt/ohpc/pub/compiler/gcc/9.4.0/bin/gcc', '/tmp/tmp2haucfn0/main.c', '-O3', '-I/beegfs/userhome/gabrielpan/.conda/envs/torch2/lib/python3.8/site-packages/triton/third_party/cuda/include', '-I/beegfs/userhome/gabrielpan/.conda/envs/torch2/include/python3.8', '-I/tmp/tmp2haucfn0', '-shared', '-fPIC', '-lcuda', '-o', '/tmp/tmp2haucfn0/triton_.cpython-38-x86_64-linux-gnu.so', '-L/usr/lib64']' returned non-zero exit status 1.
"""
I then tried to inference using the model with PyTorch 1.13 and 1 GPU card, it worked OK.
Does any one meet the same issue, and could anyone help me with that?
Best Regards,
Gabriel