Consider the memory usage of 4 GPUs while training my models using nn.DataParallel
.
We can see that cuda:0
generally acts as the master node and needs more memory. Is there any way to distribute memory uniformly among all the GPUs?
Consider the memory usage of 4 GPUs while training my models using nn.DataParallel
.
We can see that cuda:0
generally acts as the master node and needs more memory. Is there any way to distribute memory uniformly among all the GPUs?
That’s a known limitation of nn.DataParallel
which is one reason why we recommend to use DistributedDataParallel
besides the better performance of the latter approach.