Yes it is working with single GPU but when the training are with resnet18 architecture and less batch size not with resnet152 and original batch size described by author.
Yes, I meant exactly this line of code.
Could you try that, although your error seems to be a bit strange as resnet18 is running while resnet152 gets stuck.
class Model(nn.Module):
def sample(self, input):
input = input.cuda()
output = self.other_function(input)
return output
if len(opt.gpus) > 1:
print("Let's use", len(opt.gpus), "GPUs!")
os.environ["CUDA_VISIBLE_DEVICES"] = ','.join(str(x) for x in opt.gpus)
else:
torch.cuda.set_device(opt.gpus[0])
torch.cuda.manual_seed(opt.seed)
model = Model()
if len(opt.gpus) > 1:
model = nn.DataParallel(model, device_ids=opt.gpus, dim=0)
if use_cuda:
model = model.cuda()
model.train()
out = model.module.sample(data)
data.shape : (Batch_size, time_step)
However, model only use the gpus 0 when I run my code .
I don’t know why ?
Could you explain it ?
Thanks.
How large is your batch size?
Using nn.DataParallelyour data will be split in dim0 and each chunk of data will be send to a device.
If the batch size is too small, some GPU won’t be able to get a data chunk.
Thank you for the quick reply (appreciated).
My batch_size = 8,
because the length of train_data sample is very long.
I will get the error message ( out of memory ) when I only use one gpu.
So, I plan to use the multi-gpus to train my model.
However, the model only use the gpus 0 when gpus= [0, 1].
In general, batch_size be equal to ?
Could you give me some advice, thank you.
If your batch size is 8, each GPU should get 4 samples each.
Do you see the GPU1 completely empty without any utilization in nvidia-smi or a similar tool?
I use the watch -n 1 -d nvidia-smi when I run my model.
However, I see the the GPU1 completely empty.
I am confused why this happens.
Can you explain it in your experience?
Thanks.
Besides,
I output the data tensor.shape(time_step, Batch_size_data, embedding_dim)
before out = model.module.sample(data).
And I also output the output tensor.shape(time_step, Batch_size_output, hidden_dim) after the other_function.
However, the Batch_size_data == Batch_size_output.
In theory, I think Batch_size_data should be equal to 2*Batch_size_output when using the GPU0 and GPU1.
Note that I modified parameter setting
model = nn.DataParallel(model, device_ids=opt.gpus, dim=1)