Unbalanced memory usage when using Data Parallelism

For using data parallelism, When I’m using data parallelism, I face with the error that CUDA memory is not enough. But one of the GPUs’ memory is occupied around 70 percent and the other one’s memory is occupied 10 percent. I think the main reason is related to the way that I move the tensors to the device because the default device is only one of the devices. I use the following scenario for data parallelism and also for moving the tensors to device:

dev = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = nn.DataParallel(model)

model = model.to(dev)

How can I use the both of the GPUs’ memory and how can I fix my problem?

That’s a know disadvantage of DataParallel besides the communication overhead, which is why we recommend using DistributedDataParallel as it won’t suffer from these issues.