DataParallel over GRU Model only using 1 GPU?

Hello, I was wondering if I could get any pointers about applying nn.DataParallel to model that I’ve written using a GRU network. The code snippet of this model is as follows:

class GRUClassifier(nn.Module):
    def __init__(self, input_dim, hidden_dim, layers_dim, batch_size):
        super(GRUClassifier, self).__init__()
        self.hidden_dim = hidden_dim
        self.layers_dim = layers_dim
        self.batch_size = batch_size
        self.gru = nn.GRU(input_dim, hidden_dim, layers_dim)
        self.hidden = self.init_hidden()
        self.fcn = nn.Linear(self.hidden_dim * self.layers_dim, 1)
        
    def init_hidden(self):
        return (Variable(torch.randn(self.layers_dim, self.batch_size, self.hidden_dim))).cuda()
    
    def forward(self, input):
        self.batch_size = input.size()[1]
        self.hidden = self.init_hidden()
        gru_out, self.hidden = self.gru(input, self.hidden)
        hidden_out = self.hidden.view(self.batch_size, self.layers_dim * self.hidden_dim)
        x = self.fcn(hidden_out)
        x = F.sigmoid(x)
        return x

I am applying the DataParallel as follows:

SEQ_LENGTH = 2501
MELS = 256
HIDDEN_DIM = 128

model = GRUClassifier(input_dim=MELS, hidden_dim = HIDDEN_DIM, layers_dim = 4, batch_size=64)
nn.DataParallel(model.cuda())

however according to nvidia-smi only 1 GPU is currently being utilised. My machine has 2xTitan X and I’m running PyTorch in an nvidia-docker container. I’ve tried mult-GPU training on a Convolutional Neural Network using a similar method and that seems to work correctly.

1 Like

I think it might be because you are missing the device ids:
nn.DataParallel(model.cuda(), device_ids=[0, 1])

I tried to do that but it’s still only using GPU0. Might it be that GRU have not been optimized for DataParallel?

doubt that. DataParallel just copies your model across GPUs and each GPU processes a fraction of your minibatch.

Are you able to run the model in both GPUs separately?

Yes, though I’m not sure how to do it so that each mini-batch is synchronised.

I’ve tried experimenting by wrapping DataParallel around the nn.GRU but then it’s complaining of the incorrect hidden dimensions.

Edit Is there an alternative way to run the model on the GPU simultaneously on two GPUs manually without using the DataParallel method?

Have you tried model = torch.nn.DataParallel(model).cuda() ?

All good - this issue is resolved in 0.2. Great work guys! :slight_smile:

hello, i meet the same problem, could you tell me how to solve it? or is there any links? thank you very much:blush::blush:

1 Like