How to do parallel training, but only uses 1 gpu at test time

Hi,

I want to use multiple gpus only at training time. At test time, I want to use only 1 gpu. How to do that?

My model looks like:

import torch.nn as nn

class DataParallelModel(nn.Module):

def __init__(self):
    super().__init__()
    self.block1 = nn.Linear(10, 20)

    # wrap block2 in DataParallel
    self.block2 = nn.Linear(20, 20)
    self.block2 = nn.DataParallel(self.block2)

    self.block3 = nn.Linear(20, 20)

def forward(self, x):
    x = self.block1(x)
    x = self.block2(x)
    x = self.block3(x)
    return x

Instead of wrapping block2 into DataParallel before run time, should I wrap it in the forward function at run time, such as:

class DataParallelModel(nn.Module):

def __init__(self):
    super().__init__()
    self.block1 = nn.Linear(10, 20)

    # Do not wrap block2 in DataParallel here
    self.block2 = nn.Linear(20, 20)
    self.block3 = nn.Linear(20, 20)

def forward(self, x, Num_GPUs): 
    x = self.block1(x)
    if Num_GPUs > 1:
    x =  nn.DataParallel(self.block2)(x)      # Do it here
    x = self.block3(x)
    return x

we actually provide a functional data_parallel:

x = nn.parallel.data_parallel(self.block2, x)

It’s not as efficient as nn.DataParallel, but the performance penalty should be small.