I am having some trouble with data parallelism on a system with two gpus. After reading the tutorial it seems it is easy but I cannot get it running.
The point is I have two models, say model1 and model2 and I want to train both on both gpus. The point is, as i have understood, that pytorch get a minibatch let say 100 samples, and split it equally into both gpus and the average the weights. What i do is:
I don’t think DataParallel will give you any performance advantages, if your models work in a sequential way (input → model1 → model2 → output).
That’s what I would do. Do you get the error message using this approach?
If so, try:
model1 = model1.cuda(0)
model2 = model2.cuda(1)
for data in trainloader:
data = data.cuda(0)
data = Variable(data)
output = model1(data)
output = output.cuda(1) # transfer it to GPU1
output = model2(output)
I will try and report results. For the moment it is better to have them in same gpu rather than transfer the output from one model to the output of the other.
@jmaronas I don’t think you can do a forward pass on both models at the same time, since the input of model2 depends on the output of model1.
@tjoseph If both models fit on the one GPU, you could do it and it seems to be a good approach.
I assumed both GPUs are more or less fully occupied with one model each.
Probably I was wrong.
The point is that what it is parallelize is the data so if we have a copy of model1 and model2 in same gpu we can do a forward no matter if they share or not and output. So half of the batch goes in one gpu and half in other. That is how I have understood it works.
Yes I did something similar. The point is that one of my model first copies parameters of another pretrained model so I pass the already constructed model to the init of the trained model. I will investigate and report.