How to copy pretrained model to parallel model?

John1231983 · October 30, 2019, 2:19am

Hello all, I have trained my model with my dataset. At that time, my model is

class LeNet(nn.Module):
    def __init__(self):
        super(LeNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1   = nn.Linear(16*5*5, 120)
        self.fc2   = nn.Linear(120, 84)
        self.fc3   = nn.Linear(84, 10)

    def forward(self, x):
        out = F.relu(self.conv1(x))
        out = F.max_pool2d(out, 2)
        out = F.relu(self.conv2(out))
        out = F.max_pool2d(out, 2)
        out = out.view(out.size(0), -1)
        out = F.relu(self.fc1(out))
        out = F.relu(self.fc2(out))
        out = self.fc3(out)
        return out

net = LeNet()
net.train()

The network has trained and save the model to lenet.pth. Now, I changed the network to support multiple GPU and I want to load the pre-trained model lenet.pth. Then I change it to

net = LeNet()
net = torch.nn.DataParallel(net, [0,1])
ckpt = torch.load('./lenet.pth', map_location=lambda storage, loc: storage)
net.load_state_dict(ckpt['checkpoints'])

However, it does not allow to load it. How to fix it? Thanks

ptrblck · October 30, 2019, 8:36am

Could you try to restore the state_dict before wrapping the model in nn.DataParallel?
Also, what kind of error are you getting?

John1231983 · October 30, 2019, 9:53am

No. I load the model after nn.DataParallel, I guess we should use net.model.load_state_dict instead of net.load_state_dict. Am I right?

ptrblck · October 30, 2019, 10:27am

This would be one approach or load the model before wrapping it in nn.DataParallel.