Sharding model across GPUs

nn.DataParallel allows to replicate and parallelize the execution of a model by sharding over the batch.
This assumes the model can fit inside of GPU memory. Is there a natural way in pytorch to run across multi-GPU a single model.

On a similar topic, give a GAN setting with a generator and a discriminator and two GPUs, what is the recommendation to speed-up the computation, given the dependency between discriminator and generator?

1 Like

Yes, you can split your single model across multiple-GPUs in PyTorch with minimum fuss. Here is an example from @apaszke :

class MyModel(nn.Module):
    def __init__(self, split_gpus):
        self.large_submodule1 = ...
        self.large_submodule2 = ...

        self.split_gpus = split_gpus
        if split_gpus:
            self.large_submodule1.cuda(0)
            self.large_submodule1.cuda(1)

    def forward(self, x):
        x = self.large_submodule1(x)
        if split_gpus:
            x = x.cuda(1) # P2P GPU transfer
        return self.large_submodule2(x)

One caveat (to min. fuss) is that you probably want to try several split points for optimal GPU memory consumption across multiple devices! [Here] (https://gist.github.com/ajdroid/420624cdd6643c397b3a62c68904791c) is a more fleshed out example with VGG-16 :slight_smile:

4 Likes

This is interesting, but it does not really run it in parallel. While you’re running module 1, module 2 is idling (actually, the corresponding GPU) and viceversa. I was looking for a way to keep both GPUs busy at all times. :slight_smile:

Hi,

nn.DataParallel is built exactly for what you want :slight_smile:

@claudiomartella

This inherently will involve some idling because of the sequential nature of the forward + backward. This is model parallelism (as opposed to data parallelism). For example, see here