Loss doesn't decrease in DataParallel, works fine on single GPU

Just by putting my model on data parallel model = nn.DataParallel(model, device_ids=[0, 1]), I notice that my loss doen’t decrease as if no backward pass is happening.

When I remove DataParallel, everything works fine? Can you please tell me what the possible reasons might be?

Also please note that apart from the above assignment, I don’t have any other change to transition my model into a data parallel setup.