I set batch size as 80, and use 4 GPUs to train the model, its loss of the first several steps is as the red line in image
However, when I use 5 GPUs to train, its loss turns into the green line in image
My GPU is 40G-A100, and torch version is 1.10.0, and my loss function doesn’t depend on other samples
I have trained the model more epoches in different GPU numbers, which got a larger gap in Loss value and performance (5 GPUs is better than 4).
I sincerely wonder an answer and explaintion for this phenomenon. Thanks in advance.