Hi all,
I’m just a newbie to PyTorch and struggling for PyTorch distributed training. Currently, I’m trying to implement a GAN like training strategy. The training consists of two stages:
-
Fix task network, train discrinmator, my workflow is as following:
src_data -> T() ->detach()-> D() -> loss(src_pred, src_label)
tgt_data -> T()->detach()->D()->loss(tgt_pred, tgt_label) -
Fix discrinmator, train task network, my workflow is as following:
src_data->T()->supervised_loss
tgt_data->T()->D()->-1*loss(tgt_pred, tgt_label)
The task network T() and discriminator network D() are both wrapped in DDP and they are placed in different process group. The task network is trained with supervised loss with labeled data and finetuned by the adversarial loss with unlabeled data.
For this setting I have 2 questions:
- Is it the correct way to combine two DDP models? Or do I have to warp them into one single module first and then place them under DDP?
- During training process of task network, I have to fix discriminator’s parameters. Now I just set the requires_grad of all parameters in discrinmator as False and turn them back to True after the loss.backward() is called. Is there anything else to be changed? I found DDP doesn’t allow unused parameters now, but it seems okay to use a module which doesn’t require gradients entirely. Do I do it in a correct way?
I’ll appreciate if there’s somebody could tell me what’s the best practice of implementing multiple models for adversarial training. Thanks in advance!