Hi,
How exactly is DataParallel work in the case of single GPU?
I noticed that when I’m using a single GPU, I got different loss and accuracy if I wrap the model in DataParallel
model = torch.nn.DataParallel(model)
model = model.to("cuda:0")
compared to when I don’t use it
model = model.to("cuda:0")
I thought that if I’m using a single GPU, there should be no difference between using DataParallel or not. Do you know what can cause this difference?
Thank you.