nn.DataParallel allows to replicate and parallelize the execution of a model by sharding over the batch.
This assumes the model can fit inside of GPU memory. Is there a natural way in pytorch to run across multi-GPU a single model.
On a similar topic, give a GAN setting with a generator and a discriminator and two GPUs, what is the recommendation to speed-up the computation, given the dependency between discriminator and generator?
This is interesting, but it does not really run it in parallel. While you’re running module 1, module 2 is idling (actually, the corresponding GPU) and viceversa. I was looking for a way to keep both GPUs busy at all times.
This inherently will involve some idling because of the sequential nature of the forward + backward. This is model parallelism (as opposed to data parallelism). For example, see here