Suppose I have a tensor that requires gradient, and I want to clamp the values or the norm of this tensor, what is the correct way to do this using DistributedDataParallel?

My guess is to use a barrier, modify the weights, then use a second barrier.

Is this tensor a model parameter? I’m not sure I see why a barrier would be needed assuming the clamp would have the same effect across all replicas, as DDP would only synchronize the gradients of parameters—each replica can then apply the clamp after e.g., the optimizer step without needing to care about synchronization as the other replicas would be doing the same operation on private copies of the same data.