I have two GPUs and I am trying to use torch.nn.DataParallel
to wrap my model. So my model could run on two GPUs. I successfully did this many times before. However, something strange happened.
When the model is in training, I tried to print the input tensor inside the model’s forward function. There is only half batch size data there. For example, the batch_size =16, the model is only using 8 in training.
When the model is in eval mode, I also tried to print the input tensor inside the model’s forward function. Sometimes it will print twice (each with batch of 8) as expected.
Any ideas?
Here is my code
...
device = torch.device("cuda:0")
model.to(device)
model = torch.nn.DataParallel(model, device_ids=[0, 1])
# Epochs
for _ range(int(num_train_epochs)):
for step, batch in enumerate(epoch_iterator):
model.train()
loss = model(**inputs) # I compute the loss inside forward for balancing GPU memory.
....