Will `backward()` consider all the models across various GPUs with `nn.DataParallel`?

@ptrblck Thank you for your comment!

Meanwhile I was able to find another very useful thread where a beautiful illustration of how DataParallel actually works in the background is shared.

I think that the code that I shared above, might not work properly, because I declared the optimizer as part of the model. And, since I’m calling the optimizer through module, it will only update the default GPU’s model weights. Even if the errors manages to flow back to both the models because of linked computation graph, I do not see any routine to merge the gradients as one and make one global update.

However, if I instantiate optimizer independently and call it on model.parameters from the main() step 5 and 6 (from the illustration) should run as intended.

@rasbt Thank you so much for the illustration. Do you think my above remark is correct?