I have seen this question many times in different questions about DataParallel, but no one can give an explicit answer. So I question it again in a new topic, and hope anyone could answer it. Does this operation have any special purposes?
This expression is borrowed from @trypag. https://discuss.pytorch.org/t/dataparallel-and-cuda-with-multiple-inputs/272/3
Could anyone explain this code extracted from the imagenet example :
if args.arch.startswith('alexnet') or args.arch.startswith('vgg'): model.features = torch.nn.DataParallel(model.features) model.cuda() else: model = torch.nn.DataParallel(model).cuda()
Is there a specific reason to separate the classifier and the features in the alexnet and vgg models ?
Why not giving the whole model to DataParallel, like in the resnet model ?