I am having the same imbalance issue but the problem is that my gpu 1 not gpu 0 is going out of memory. Both gpus have 32GB of memory. With NVIDIA-SMI i see that gpu 0 is only using 6GB of memory whereas, gpu 1 goes to 32.
I could have understood if it was other way around with gpu 0 going out of memory but this is weird.
I only pass my model to the DataParallel so it’s using the default values.
Also, if I use only 1 GPU, i don’t get any out of memory issues. This is also strange for me.
Any help would be appreciated.
p.s. I was getting warning about rnn parameters not being in contiguous memory so i added the flatten_parameters() call as well in forward of lstm
cudaID = str(torch.cuda.current_device())
device = torch.device("cuda:" + cudaID)
print('device = ', device) // this prints cuda:0
if torch.cuda.device_count() > 1:
print("Let's use", torch.cuda.device_count(), "GPUs!")
encoder = torch.nn.DataParallel(encoder)
lstm_model = torch.nn.DataParallel(lstm_model)
encoder.to(device)
lstm_model.to(device)
// forward method of lstm
def forward(self, inputs, mode='train'):
packed = tn.pack_sequence(inputs, enforce_sorted=False)
self.hidden = self.init_hidden(len(inputs), packed.data.device) // sending device of packed so both packed and self.hidden are on same device, as self.hidden is created in every call and im using multiple gpus
self.lstm.flatten_parameters()
if mode == 'eval' or mode == 'test':
with torch.no_grad():
packed_out, self.hidden = self.lstm(packed, self.hidden)
else:
packed_out, self.hidden = self.lstm(packed, self.hidden)
outputs, lens = tn.pad_packed_sequence(packed_out, batch_first=True)
return outputs