Which way to optimize is correct?

jindongwang · June 21, 2018, 4:48am

I have confusions about the autograd of a customed network. Here are two examples:

Example 1

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.linear1 = nn.Linear(5, 10)
        self.linear2 = nn.Linear(10, 5)

    def forward(self, x):
        x = self.linear1(x)
        x = self.linear2(x)
        return x

net = Net()
net1. net = net
net1.fc = nn.Linear(5,10)

Then it’s optimize func. Both net1. net and net1.fc will be sent as the params.

Example 2

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.linear1 = nn.Linear(5, 10)
        self.linear2 = nn.Linear(10, 5)
        self.fc = nn.Linear(5,10)

    def forward(self, x):
        x = self.linear1(x)
        x = self.linear2(x)
        x = self.fc(x)
        return x

net = Net()

Then it’s optimize func.

Which way is correct? It seems if you don’t write the module in the forward func, you will have a wrong gradient.

ptrblck · June 21, 2018, 9:02am

In the first example your definition of net1 is missing.
It could probably look like this:

class Net1(nn.Module):
    def __init__(self):
        super(Net1, self).__init__()
        self.net = nn.Linear(5, 5)
        self.fc = nn.Linear(5, 10)

    def forward(self, x):
        x = self.net(x)
        x = self.fc(x)
        return x

If this is the case, both models will work just fine.
Your first example looks more like a pre-trained model, where you would like to change the last linear layer to match your number of classes.
The second example is just a vanilla model.

As a small side note: you are missing the non-linearities. In PyTorch the layers do not include any activation functions, so that you would have to add them to your models.

jindongwang · June 21, 2018, 9:27am

@ptrblck Yes, I just want to use a pretrained model, and replace its final layers with my defined layer. My question is if I modify this model out of the model class (such as Example 1), then the output is calculated as:

output = net1.fc(net1(input))

However, the fc layer is not in the forward function. Will this influence the backward function?

ptrblck · June 21, 2018, 11:04am

If you are modifying a pre-trained model using model.fc = ..., then this layer should be in the forward method.
Could you post the model definition or a link to it?

jindongwang · June 21, 2018, 12:20pm

@ptrblck This is the model using pretained AlexNet. I just tried to reimplement the network in this figure:
Deep Domain Confusion

class AlexNetFc(nn.Module):

    def __init__(self, pretrained=False, num_classes=1000):
        super(AlexNetFc, self).__init__()
        model_alexnet = alexnet(pretrained=pretrained)
        self.features = model_alexnet.features
        self.classifier = nn.Sequential()
        for i in range(6):
            self.classifier.add_module("classifier" + str(i), model_alexnet.classifier[i])
        self.__in_features = model_alexnet.classifier[6].in_features
        self.nfc = nn.Linear(4096, num_classes)

    def forward(self, x, y):
        x = self.features(x)
        y = self.features(y)
        dist = distance_function(x,y)
        x = x.view(x.size(0), 256 * 6 * 6)
        x = self.classifier(x)
        x = self.nfc(x)
        y = y.view(x.size(0), 256 * 6 * 6)
        y = self.classifier(y)
        y = self.nfc(y)
        return x,y,dist

    def output_num(self):
        return self.__in_features

    def distance_function(x,y):
        return some distance such as Euclidean distance

ptrblck · June 21, 2018, 12:35pm

The code looks good.
From your image you’ve provided it looks like the model is shared, so it should work in this way.

jindongwang · June 21, 2018, 12:48pm

Thank you! I just want to know if it is correct if I moved the distance_function out of the model class, and use the model as Example 1. I get really confused about this.