DataParallel with single GPU

Hi,
How exactly is DataParallel work in the case of single GPU?

I noticed that when I’m using a single GPU, I got different loss and accuracy if I wrap the model in DataParallel

model = torch.nn.DataParallel(model)
model = model.to("cuda:0")

compared to when I don’t use it

model = model.to("cuda:0")

I thought that if I’m using a single GPU, there should be no difference between using DataParallel or not. Do you know what can cause this difference?

Thank you.

Did you set a seed and enabled deterministic operations? If you don’t set a specific seed for your model, you would get different loss and accuracy between runs anyway.

Yes, I did. I can get the same loss and accuracy when rerun the same model (e.g. with DataParallel) but not between the 2 cases above