Training multiple models on one GPU simultaneously

You do call optimizer[i].zero_grad() (or the model version) at every iteration during your epoch right? You can check this discussion if you do not: Why do we need to set the gradients manually to zero in pytorch?

Otherwise, it looks good !