Hi @mrshenli,
Thank you for confirming the 1st option and pointing to the related part of the DDP source code.
I checked the DDP implementation and it seems that option 1 is the only possible way for now.
forward
is the only function that DDP supports safe parallelization and going for option 3 would be an adventure.
By the way, I’m not sure if I could avoid the function call patterns like forward
, forward
, backward
you mentioned.
Thank you very much and I will post here when I come up with a nice solution.
Best,
Seungjun