How to train this model on multi GPUs

Instead of creating two models, you can create just one model like this. Then you can simply warp the model with nn.DataParallel.