Why not giving the whole model to DataParallel in the imagenet example?

fanq15 · June 17, 2017, 3:28pm

I have seen this question many times in different questions about DataParallel, but no one can give an explicit answer. So I question it again in a new topic, and hope anyone could answer it. Does this operation have any special purposes?
This expression is borrowed from @trypag. https://discuss.pytorch.org/t/dataparallel-and-cuda-with-multiple-inputs/272/3

Could anyone explain this code extracted from the imagenet example :

if args.arch.startswith('alexnet') or args.arch.startswith('vgg'):
    model.features = torch.nn.DataParallel(model.features)
    model.cuda()
else:
    model = torch.nn.DataParallel(model).cuda()
Is there a specific reason to separate the classifier and the features in the alexnet and vgg models ?
Why not giving the whole model to DataParallel, like in the resnet model ?

apaszke · June 17, 2017, 4:42pm

The answer is in One weird trick for parallelizing convolutional neural networks by Alex Krizhevsky.

fanq15 · June 18, 2017, 1:59am

Oh! Thank you very much!

lusofa · August 13, 2019, 7:57am

I read the paper yet, but still feel confused.
Is that means “model.features = torch.nn.DataParallel(model.features)” corresponding to Model parallelism, and “torch.nn.DataParallel(model)” corresonding to Data parallelism?
But why “args.arch.startswith(‘alexnet’) or args.arch.startswith(‘vgg’)” ,then do Model Parallelism, else do Data parallelism?