Hi, is there any way to perform clip_grad_norm_ for the gradient produced by each sample (rather than a batch) when using DDP?
AFAIK, this is impossible since the gradients are reduced over the batch and the information of each sample is no longer resumable.
Could you tell in what situation you want to do that?
You can probably do this via the autograd hooks: torch.Tensor.register_hook — PyTorch 1.13 documentation or the DDP communication hooks: DDP Communication Hooks — PyTorch 1.13 documentation