Because I will dynamicaly change the module to perform some kind of online network pruning,it’s impossible to apply the manipulations before wrapping to DataParallel.
Re-wrapping after the epoch should be alright.
However, I would recommend to create some dummy example and make sure the manipulation an re-wrapping is really working, e.g. set all parameters to zero and check the parameters in the next iteration for these values.
Does the re-wrapping technique works well for you? I’d imagine it’s slow to use as wrapping with dataparallel each time copies ALL its weights to other GPUs, rather than copying just the ones needed when pruning the weights.
I have a similar question: I am changing the requires_grad of module parameters after wrapping with DataParallel. In DistributedDataParallel, which is now recommended instead of DataParallel, there is a warning that says “don’t do it!”. But I do not see the same warning for DataParallel. I agree that the proper way is to re-wrap it after the change, but that is a little more code.