-1
When I trained by 4 GPUs like this:
python -m torch.distributed.launch --nproc_per_node=4 train_net.py
.
There would be a error:
Traceback (most recent call last):
File "/home_ex/tianhongtao/SW/anaconda3/envs/Hisense/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home_ex/tianhongtao/SW/anaconda3/envs/Hisense/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home_ex/tianhongtao/SW/anaconda3/envs/Hisense/lib/python3.7/site-packages/torch/distributed/launch.py", line 263, in <module>
main()
File "/home_ex/tianhongtao/SW/anaconda3/envs/Hisense/lib/python3.7/site-packages/torch/distributed/launch.py", line 259, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home_ex/tianhongtao/SW/anaconda3/envs/Hisense/bin/python', '-u', 'Run.py', '--local_rank=3']' returned non-zero exit status 1.
Could anyone tell me what happend?