@ptrblck Thank you for your comment!
Meanwhile I was able to find another very useful thread where a beautiful illustration of how DataParallel
actually works in the background is shared.
I think that the code that I shared above, might not work properly, because I declared the optimizer as part of the model. And, since I’m calling the optimizer through module
, it will only update the default GPU’s model weights. Even if the errors manages to flow back to both the models because of linked computation graph, I do not see any routine to merge the gradients as one and make one global update.
However, if I instantiate optimizer independently and call it on model.parameters
from the main()
step 5 and 6 (from the illustration) should run as intended.
@rasbt Thank you so much for the illustration. Do you think my above remark is correct?