I am fully aware of the ability of accessing my network through nn.DataParallel.module. However, I need to make sure that my code is executable (no AttributeError when I am trying to access my defined-variable in my network) whether the user is using single GPU or multiple GPUs. Hence, the DataPrallelWrapper.
The only problem is I do not know whether this is a good approach or not (will my approach invite any data race or async problem?).
This should be fine I think. The main logic of DataParallel is implemented in the forward() function, it replicates the given module to available devices, scatters input, launches one thread per device to process the scattered input, and then joins threads and gathers the outputs. So as long as you are not trying to modify module parameters or gradients concurrently when executing DataParallel.forward(), I don’t see an issue here.