I am a freshman on using DDP, so I am trying to run an example supported by Pytorch. Its link is multigpu.py. There are 4 GPUs on my machine. PyTorch version: ‘2.0.1+cu117’; OS: Ubuntu 20.04.4 LTS
When I run the code with the following command format python multigpu.py 10 5 > output.tx &
, it will spend several hours without output, even without errors. I know it did run because it created a new file named “checkpoint.pt” It should finish in a minute and print out something like [GPU 0 ...
[GPU 1 ...
[GPU 2 ...
[GPU 3 ...
Could someone help me?