I am facing a very weird issue here. So, I am trying to train a pretrained resnext model for some specific task. I removed the last few layers from resnext and feeding my images to it. So let’s say, output size from my truncated resnext is (batch size,1024,14,14). I am printing output[0][0][13]. This output changes with batch size per gpu. I am feeding exactly the same images.
Batch Size 1:(Using one GPU)
Output:tensor([0.0000, 0.0000, 0.0931, 0.0179, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000]
Batch Size 2: (Using one GPU)
Output: tensor([0.0000, 0.0000, 0.2732, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000]
Batch Size 2:(Using two GPUs)
Output:tensor([0.0000, 0.0000, 0.0931, 0.0179, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
So, the result changes with number of images per gpu. Is there any explanation for this? This is just first iteration. I did not update any weights or anything.