Hi, I am currently experiencing the problem that when I go from single-gpu to multi-gpu training the performance degrades severely. Also I’ve noticed that the learning curves and losses are much noisier. The only difference between the two setups in the script is the value of the environmental variable CUDA_VISIBLE_DEVICES. I am using torch.nn.DataParallel to parallelize the model as in https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html.
What could be the reason of this highly puzzling difference between training with 1 gpu or > 1 gpu? Or what can I do to isolate the source of the problem?
Thanks in advance