I’m interested in learning what are the good practices in parallelizing pytorch models over multiple GPUs; more specifically I have two questions.
My first question concerns nn.DataParallel
and a class that inherits from it, DataParallelPassthrough
. More specifically, in this work, a pre-trained GAN generator, G
, which has been set to evaluation mode, is used for generating images. This model, as an instance of a specific class is used as:
G = DataParallelPassthrough(G)
where DataParallelPassthrough
is defined as follows:
from torch import nn
class DataParallelPassthrough(nn.DataParallel):
def __getattr__(self, name):
try:
return super(DataParallelPassthrough, self).__getattr__(name)
except AttributeError:
return getattr(self.module, name)
How is DataParallelPassthrough
different than standard nn.DataParallel
? Why should one prefer the former over the latter? I have seen it being used in some repos, but I couldn’t find any good explanation why. Could you help me understand?
Now, except for the aforementioned generator model G
, there is another model (an instance of a different class), which takes G
's generated images as input. You may think of it as a ResNet-like model. However, this model is not set as an instance of DataParallelPassthrough
. This causes some issues. For instance, while the whole model would fit in a 32GB Tesla V100, it doesn’t in a pair of 16GB V100’s.
My second question is how should I parallelize both models. Should I instantiate both models using nn.DataParallel
or DataParallelPassthrough
? I would like to avoid the case where I load each model to a specific GPU device (e.g., using .to()
).
Thank you!