DataParallel over GRU Model only using 1 GPU?

hwchong · July 31, 2017, 8:36am

Hello, I was wondering if I could get any pointers about applying nn.DataParallel to model that I’ve written using a GRU network. The code snippet of this model is as follows:

class GRUClassifier(nn.Module):
    def __init__(self, input_dim, hidden_dim, layers_dim, batch_size):
        super(GRUClassifier, self).__init__()
        self.hidden_dim = hidden_dim
        self.layers_dim = layers_dim
        self.batch_size = batch_size
        self.gru = nn.GRU(input_dim, hidden_dim, layers_dim)
        self.hidden = self.init_hidden()
        self.fcn = nn.Linear(self.hidden_dim * self.layers_dim, 1)
        
    def init_hidden(self):
        return (Variable(torch.randn(self.layers_dim, self.batch_size, self.hidden_dim))).cuda()
    
    def forward(self, input):
        self.batch_size = input.size()[1]
        self.hidden = self.init_hidden()
        gru_out, self.hidden = self.gru(input, self.hidden)
        hidden_out = self.hidden.view(self.batch_size, self.layers_dim * self.hidden_dim)
        x = self.fcn(hidden_out)
        x = F.sigmoid(x)
        return x

I am applying the DataParallel as follows:

SEQ_LENGTH = 2501
MELS = 256
HIDDEN_DIM = 128

model = GRUClassifier(input_dim=MELS, hidden_dim = HIDDEN_DIM, layers_dim = 4, batch_size=64)
nn.DataParallel(model.cuda())

however according to nvidia-smi only 1 GPU is currently being utilised. My machine has 2xTitan X and I’m running PyTorch in an nvidia-docker container. I’ve tried mult-GPU training on a Convolutional Neural Network using a similar method and that seems to work correctly.

miguelvr · July 31, 2017, 1:30pm

I think it might be because you are missing the device ids:
nn.DataParallel(model.cuda(), device_ids=[0, 1])

hwchong · August 1, 2017, 2:24am

I tried to do that but it’s still only using GPU0. Might it be that GRU have not been optimized for DataParallel?

miguelvr · August 1, 2017, 9:12am

doubt that. DataParallel just copies your model across GPUs and each GPU processes a fraction of your minibatch.

Are you able to run the model in both GPUs separately?

hwchong · August 2, 2017, 6:50am

Yes, though I’m not sure how to do it so that each mini-batch is synchronised.

I’ve tried experimenting by wrapping DataParallel around the nn.GRU but then it’s complaining of the incorrect hidden dimensions.

Edit Is there an alternative way to run the model on the GPU simultaneously on two GPUs manually without using the DataParallel method?

chihyaoma · August 4, 2017, 4:45pm

Have you tried model = torch.nn.DataParallel(model).cuda() ?

hwchong · August 15, 2017, 12:54am

All good - this issue is resolved in 0.2. Great work guys!

feixian15 · September 25, 2017, 9:27am

hello, i meet the same problem, could you tell me how to solve it? or is there any links? thank you very much:blush: