I tried to train my model on multiple gpus. However, when I launch the program, it hangs in the first iteration. Using nvidia-smi, i find hundreds of MB of memory is consumed on each gpu. I guess these memory usage is for model initialization in each gpu.
I am sharing 8 gpus with others on the server, so I limit my program on GPU 2 and GPU 3 by following command.
Is the memory used on all GPUs or just the two you’ve selected?
Sometimes it can be problematic to set the env variable inside the python script, e.g. when CUDA was already initialized. The workaround would be to set the available GPUs before calling the script:
CUDA_VISIBLE_DEVICES=2,3 python script.py
How did you check your data is not fed into the model?
I’ve tested your suggestion, but nothing changes.
Honestly, I didn’t explicitly check whether the data is feed the model, BUT, the for-loop used for the training process just doesn’t iterate, so I guess the data is not fed into the model.
The for-loop is not iterating. After consuming hundreds of MB of GPU memory, the program seems to be freezed.
I found another piece of information a few minutes ago, which might be helpful.
I know, it’s 2018, but the lab server is still using CUDA 7.5, but I’m using PyTorch 0.4.0 compiled with CUDA 8.0.
Please don’t ask me to switch back to PyTorch 0.3.0 with CUDA 7.5. I’ve tried, and there are a lot of horrible compatibility issues.
No, I wouldn’t switch back to an older version.
I think you should be fine, since the binaries ship with its own libs.
Could you check it with print(torch.version.cuda)?
What do you mean by “the for-loop is not iterating”?
Does your code just exit or does it hang in the loop?
Are you able to get a single sample from your DataLoader?
@ptrblck
My dataloaders work very well on a single GPU, so I guess there is nothing wrong with it.
My code hangs in the first iteration
Here is part of my code.
In the first iteration, the “First checkpoint” is printed out but the “Second checkpoint” doesn’t.
The whole program is hung there.