@ptrblck hi, should i add any other details…could this be related to dataloader? I tried very large batch size as well like 2048 * 4 but still second gpu is not utilized
Thanks for the code!
I’ve just tried to run it on our machine and see all GPUs are used:
class Net(nn.Module):
def __init__(self, vocab_size):
super(Net, self).__init__()
self.vocab_size = vocab_size
self.embed_size = 300
self.hidden_size = 300
self.linear = nn.Linear(1024, 300)
self.dropout_rate = 0.5
self.embedding = nn.Embedding(self.vocab_size, self.embed_size)
self.dropout = nn.Dropout(self.dropout_rate)
self.LSTM = nn.LSTM(self.embed_size, self.hidden_size, bidirectional=True)
self.multimodal_linear = nn.Linear(600, 2)
def forward(self, s, c):
r = self.linear(c)
self.LSTM.flatten_parameters()
embedded = self.embedding(s)
embedded = self.dropout(embedded)
# Each batch has the same maxlen, how to make data loader with custom maxlen?
input_lengths = torch.tensor([10]*s.size(0)).long() #[sent.shape[1]]* sent.shape[0]
packed = torch.nn.utils.rnn.pack_padded_sequence(embedded, input_lengths, batch_first=True)
output, hidden = self.LSTM(packed, None)
output, _ = torch.nn.utils.rnn.pad_packed_sequence(output, batch_first=True)
bi_text = hidden[0][0,:,:].squeeze()
o = self.multimodal_linear(torch.cat((bi_text, r), dim=1))
return o
device = 'cuda'
N = 64*64
model = Net(100).to(device)
model = nn.DataParallel(model)
s = torch.randint(0, 100, (N, 10)).to(device)
c = torch.randn(N, 1024).to(device)
for _ in range(100):
out = model(s, c)
print(out)
I had to fix some minor issues to run the code (self.m_linear should probably be self.multimodel_linear in the forward).
Could you run my code and check, if all GPUs are utilized?
That’s a bit strange. Could you post exactly the code you’ve executed so that I could run it on a machine? As explained, I had to manipulate your last code snippet a bit in order to run it and am afraid we might not compare the same codes.
the difference is that my wrapper (model object) is not a subclass from nn.Module . so, my model is model.net.module , which i wrap around the data parallel api.
If you don’t want to derive from nn.Module, you might have to implement the parallel calls manually using the functional API, as the vanilla nn.DataParallel uses the nn.Module.forward method to chunk the data etc.