I was running a training using DDP. When the pytorch pruning lib is used, I found that there is very little inconsistency in the prune mask density. I used global_unstructured. Is it supposed to be non-deterministic? perhaps due to the floating point operation? As long as DDP ensures that weights are always in sync, the pruning masks should be identical as well all the time?
@thyeros Ideally pruning on the same weight and bias array, should result the same masks as the L1 norms would be the same (in case of L1Unstructured)
Can you check whether the weights are indeed the same on all machines before you execute the pruning call
I found that the weights before pruning were indeed off sync due to the index_add op I used to massage them. Thanks for advise.