My model fits on single gpu and works nicely but it is slow. In order to scale the computations, I want to parallelise the model so that I can use multiple gpus.
I have done
model = torch.nn.DataParallel(model).cuda()
in place of model.cuda()
When I run this with 2 gpus, it is working with batch-size=1 but only using single gpu and when I increase the batch-size it says out of memory.
Maybe you are running out of memory on the default device which will gather and scatter some parameters, thus usually using a bit more memory than the other devices.
@ptrblck With the help of nvidia-smi, I am monitoring the gpu memory usage. There I am getting only one of the gpus getting ~11 GB of memory and then it says out of memory and the other gpu has a basic memory usage of 2 MBs.
I hope that should not be the case as I understand from the blog.
Just for the sake of completeness:
Based on a small chat, it seems this code base is used.
Currently the Trainer class provides convenient methods to train the model. However, skimming through the code it looks like some refactoring would be needed to make this code executable for nn.DataParallel, e.g. since the optimizer seems to be embedded in the trainer class.
Iโm also not sure how these lines of code would be handled by nn.DataParallel, since no GPU id is passed to the cuda calls. Itโs currently a guess, but I think this might also cause the OOM issue in this case.
No, DataParallel in its basic form is just applied on the model, such that the input batch will be split in dim0 and each specified GPU will get a chunk. The forward and backward passes are executed in parallel and all necessary gradients etc. are finally gathered on the default device.
Your training routine should not change if you are using DataParallel.
However, in the mentioned code base, the Trainer class is handling the models, optimizers etc.
Using DataParallel by just wrapping your model in it and leaving all other code snippets as they were, is thus most likely not possible, and would need some code refactoring.
Is there a way to slipt the Model into several sub models and make them parallel gradually? Handling a complex Model as a whole is an almost impossible task. Thanks.