Loss.backward(retain_graph=True) giving Runtime error in Distributed data parallel

I’m trying to do DDP in PyTorch. I have multiple loss functions and using retain_graph= True is giving a run time error as shown below:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [256]] is at version 57; expected version 56 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
PyTorch version: 1.0.0
I also updated PyTorch to the latest version(1.4.0), but the same error keeps coming back.