I have two GPUs and I am trying to use
torch.nn.DataParallel to wrap my model. So my model could run on two GPUs. I successfully did this many times before. However, something strange happened.
When the model is in training, I tried to print the input tensor inside the model’s forward function. There is only half batch size data there. For example, the batch_size =16, the model is only using 8 in training.
When the model is in eval mode, I also tried to print the input tensor inside the model’s forward function. Sometimes it will print twice (each with batch of 8) as expected.
Here is my code
... device = torch.device("cuda:0") model.to(device) model = torch.nn.DataParallel(model, device_ids=[0, 1]) # Epochs for _ range(int(num_train_epochs)): for step, batch in enumerate(epoch_iterator): model.train() loss = model(**inputs) # I compute the loss inside forward for balancing GPU memory. ....