Need help to understand the forward method

Shushma_Kulkarni · November 30, 2020, 5:58am

I am writing a MLP in pytorch using sequential model, but I am not understanding if the model is actually updating weights when I call :
optimizer.zero_grad()
scores = model(data)
loss = criterion(scores, targets)
# backward
loss.backward()
# gradient descent or adam step
optimizer.step()

My model is as below:
def init(self, input_size, out_size):
super(Feedforward, self).init()
self.layer1 = nn.Sequential()
self.layer1.add_module(“fc1”, torch.nn.Linear(input_size, 65))
self.layer1.add_module(“bn1”, nn.BatchNorm1d(num_features=65, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True))
self.layer1.add_module(“Relu1”, torch.nn.ReLU())
self.layer1.add_module(“dropout”,nn.Dropout(p=0.2))
self.layer1.add_module(“fc2”, torch.nn.Linear(65, 60))
self.layer1.add_module(“bn2”, nn.BatchNorm1d(num_features=60, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True))
self.layer1.add_module(“Relu2”, torch.nn.ReLU())
self.layer1.add_module(“dropout2”,nn.Dropout(p=0.2))
self.layer1.add_module(“fc4”, torch.nn.Linear(60, out_size))
self.layer1.add_module(“Softmax”,torch.nn.Softmax(dim=1))

    def forward(self, x):
        x = self.layer1(x)
        return self.fc.forward(x)

    def initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
            elif isinstance(m, nn.Linear):
                nn.init. xavier_normal_(m.weight)

ptrblck · November 30, 2020, 7:05am

The code should update the model parameters, if you’ve previously passed them to the optimizer.
You can print a specific parameter before and after the optimizer.step() operation and compare the values to make sure it’s working as intended.

PS: you can post code snippets by wrapping them into three backticks ```, which would make debugging easier.

Shushma_Kulkarni · November 30, 2020, 9:13am

Well, the issue is I am not sure if its working. I have a keras program with same number of layers and other hyperparameters, it gives 92% accuracy. But the pytorch model gives 20% accuracy on the same data. Can you please explain what is the difference between return x and return self.fc.forward(x) in the forward function.

Alexey_Demyanchuk · November 30, 2020, 10:23am

I am sorry for interruption. I don’t see in your code you are initializing self.fc anywhere.
With regard to the last question: return x is doing self.layer1(x) will do forward pass of your data through the whole layer1 and returning result. If you are returning self.fc.forward(x) it is doing forward pass through it with previously achieved result from layer1.

It is also worth mention, you have to make sure the loss function you are using is awaiting as in input a ‘softmaxed’ version or raw logits (output of the last linear layer).