Distributed Parallel, one machine multi gpu multi process?

@Shiro

If each process runs the same amount of iterations with each iteration consuming the same amount of data, using 2GPUs might actually take longer, because there will be additional communication overhead between the two GPUs. But in this case, your model is actually trained using 2X number of batches.

Reducing the number of iterations should work, or you can also reduce the batch size. One thing to note is that, this might also call for additional tuning on the learning rate or other configs. Some relevant discussions are available here.