Relationship between the net structure and forward

Maogic · May 9, 2018, 11:20pm

I try to understand the structure of a network but got confused. I want to know what determines the structure of a network, the init function or the forward()?

In the tutorial, I saw a network can be defined as

class DynamicNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        super(DynamicNet, self).__init__()
        self.input_linear = torch.nn.Linear(D_in, H)
        self.middle_linear = torch.nn.Linear(H, H)
        self.output_linear = torch.nn.Linear(H, D_out)

    def forward(self, x):
  
        h_relu = self.input_linear(x).clamp(min=0)
        for _ in range(2):
            h_relu = self.middle_linear(h_relu).clamp(min=0)
        y_pred = self.output_linear(h_relu)
        return y_pred

In the DynamicNet, since the middle linear was used 3 times, so I guess there are 4 hidden layers but do they have the same weight?

And what if I define a network like this:

classNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        super(DynamicNet, self).__init__()
        self.input_linear = torch.nn.Linear(D_in, H)
        self.middle_linear = torch.nn.Linear(H, H)
        self.extra_linear = torch.nn.Linear(H, H)
        self.output_linear = torch.nn.Linear(H, D_out)

    def forward(self, x):
  
        h_relu = self.input_linear(x).clamp(min=0)
        h_relu = self.middle_linear(h_relu).clamp(min=0)
        y_pred = self.output_linear(h_relu)
        return y_pred

I have a module extra_linear in the init but it is not used in forward. Will it have any effect on the network? Will the parameter of that layer be updated during back propagation?

In a summary, I feel the structure of the network is defined in forward, but when I print the network, it is what defined in init shows up. What is the relationship between init and forward?

I hope I expressed myself clearly…

Thank you!

ptrblck · May 10, 2018, 6:39pm

In the first use case the middle_linear layer will be used a few times with the same weights, since its output is fed into the layer again.

In the second example, the extra_layer will be registered as a parameter in the model but won’t be updated, since the forward method doesn’t use it. You can have a look at its gradients after a backward pass and you’ll see it’s None.

The forward method defines the computation graph every time the model is called. This graph is then used in the backward pass. Since it’ll be created in each forward pass, you can define your computation e.g. using conditions etc. That’s why your model architecture might even change based on some conditions.

Usually you register all parameters in the __init__ method, which are used in the forward method.

Maogic · May 10, 2018, 8:14pm

Thank you! I think I got the idea now.