I am the new user of the Pytorch. And I met the following problem:
My training code gets stuck after tens of iteration steps (it does not iterate anymore after hours waiting).
Then I use Ctrl+C to stop the training, it does not stop the code.
And I use nvidia-smi to see the GPU use, the GPU is still occupied and doing computation.
Is there anyone knowing the reason?
Could you post a (small) executable code snippet so that we could debug the issue?
Also, are you using multiple workers in your
DataLoader? If so, does your code run using
Thanks for your reply.
My problem was solved by fixing a bug: I replaced zero by torch.zeros_like() when initializing a tensor.
The num_workers, in my case, was 16 and I did not change it even when the problem was solved. I have tried set it to zero but same problem still happened, therefore I think that problem may be caused by other potential bugs.
ptrblck via PyTorch Forums email@example.com 于2019年4月10日周三 下午4:19写道：