Multiple parameter optimization in Multi GPU

srv902 · March 4, 2017, 5:46pm

Hello,

I am using pre-trained VGGNet-16 model where the layers skipping the FC part are wrapped in torch.nn.DataParallel.

The optimizer I used is:

optimizer = optim.SGD([{'params': model.pretrained_model[0][24].parameters()},
                           {'params': model.pretrained_model[0][26].parameters()},
                           {'params': model.pretrained_model[0][28].parameters()},
                           {'params': model.regressor[0][1].parameters()},
                           {'params': model.regressor[0][4].parameters()}], lr=0.001, momentum=0.9)

which gives me 'DataParallel' object does not support indexing TypeError.

pretrained_model contains the CONV layers only and regressor contains FC layers only.

There is no error if I use model.regressor.parameters(), but I need to update parameters in last few layers in pretrained_model also. How do I fix it?

apaszke · March 4, 2017, 11:25pm

It seems that model.regressor is torch.nn.DataParallel and it doesn’t support indexing. You can extract the encapsulated module using an additional .model index: model.regressor.model[0][1].

srv902 · March 5, 2017, 4:24am

Oh, I forgot to mention that model.pretrained_model is torch.nn.DataParallel not model.regressor. So, now out of 26 layers in model.pretrained_model, I will be updating only last 3 weight layers in model.pretrained_model.

Following is the model summary:

Sequential (
  (0): Sequential (
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU (inplace)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU (inplace)
    (4): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU (inplace)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU (inplace)
    (9): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU (inplace)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU (inplace)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU (inplace)
    (16): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
    (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (18): ReLU (inplace)
    (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (20): ReLU (inplace)
    (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (22): ReLU (inplace)
    (23): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
    (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (25): ReLU (inplace)
    (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (27): ReLU (inplace)
    (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (29): ReLU (inplace)
    (30): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
  )
  (1): Sequential (
    (0): Dropout (p = 0.5)
    (1): Linear (25088 -> 4096)
    (2): ReLU (inplace)
    (3): Dropout (p = 0.5)
    (4): Linear (4096 -> 4096)
    (5): ReLU (inplace)
    (6): Linear (4096 -> 1000)
  )
)

and the class for model creation:

  class MyModel(nn.Module):
      def __init__(self, pretrained_model):
          super(MyModel, self).__init__()
          self.pretrained_model = nn.Sequential(*list(pretrained_model.children())[:-1])
          self.pretrained_model = torch.nn.DataParallel(self.pretrained_model)
          self.regressor = nn.Sequential(net1)
  
      def forward(self, x):
          x = self.pretrained_model(x)
          x = x.view(-1,35840)
          x = self.regressor(x)
          x = x.view(-1,57,77)
          return x

Also, there is no model attribute in both model.regressor and model.pretrained_model.

apaszke · March 5, 2017, 10:22pm

I don’t understand what finally do you want to index. Basically, if you have a module wrapped in torch.nn.DataParallel you should use its .module attribute to extract it.

srv902 · March 6, 2017, 4:13am

I think I was not able to frame my question properly. In the SGD optimizer I was using the parameters of the model.pretrained_model as

model.pretrained_model[0][24].parameters()

for which it gave 'DataParallel' object does not support indexing error. But, If I change it to

model.pretrained_model.module[0][24].parameters()

it didn’t give any indexing error.

Thank you.