Thanks for posting the question @rajamohan_reddy This seems not a DDP problem, but an optimizer problem. As you can see from the note of the doc, sparse gradients only supported by certain optimizers (not all optimizer). Did you specify some optimizers that does not support sparse gradients? Embedding — PyTorch 1.11.0 documentation