Multilabel classifier's last layer is not tracked by pytorch

Suleymanzade · October 7, 2022, 1:07pm

Hello. I created the multilabel classifier for the tabular data where the number of the targets (outputs) can be increased - so, I cannot hardcode the output layer part, it must be flexible. The code of the model is shown below:

class MultiHeadBinaryModel(nn.Module):
    def __init__(self, input_layer, output_layer):
        super(MultiHeadBinaryModel, self).__init__()
        self.input_layer = input_layer
        self.output_layer = output_layer
        self.fc1 = nn.Linear(input_layer, 290) # 12 is the number of features
        self.bn1 = nn.BatchNorm1d(290)
        self.fc2 = nn.Linear(290, 320)
        #self.bn2 = nn.BatchNorm1d(320)
        
        self.fc3 = nn.Linear(320, 370)
        #self.bn3 = nn.BatchNorm1d(370)
        
        self.fc4 = nn.Linear(370, 420)
        #self.bn4 = nn.BatchNorm1d(420)
        self.dropout = nn.Dropout(0.15)
        # we will treat each head as a binary classifier ...
        # ... so the output features will be 1
        self.pre_out = []
        for i in range(self.output_layer):
            self.pre_out.append(nn.Linear(420, 1).to(device))
            
    def update_output_layer(self, new_output_layer):
        if new_output_layer > self.output_layer:
            diff_layer = new_output_layer - self.output_layer 
            for _ in range(diff_layer):
                self.pre_out.append(nn.Linear(420, 1).to(device))
            self.output_layer = new_output_layer       
        
        
    def forward(self, x):
        x = F.relu(self.bn1(self.fc1(x)))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = self.dropout(x)
        x = F.relu(self.fc4(x))
        
        # each binary classifier head will have its own output
        out = [None] * self.output_layer
        for i in range(self.output_layer):
            out[i] = torch.sigmoid(self.pre_out[i](x))       
        return torch.cat(out, dim=1)

Initially I have 23 categories that can be recognized at once. the problem appears when I try to see the architecture of my model, let say print(model) that must print the all the layers - but the output layer is not seen by PyTorch tracker.

MultiHeadBinaryModel(
  (fc1): Linear(in_features=216, out_features=290, bias=True)
  (bn1): BatchNorm1d(290, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (fc2): Linear(in_features=290, out_features=320, bias=True)
  (fc3): Linear(in_features=320, out_features=370, bias=True)
  (fc4): Linear(in_features=370, out_features=420, bias=True)
  (dropout): Dropout(p=0.15, inplace=False)
)

How to change the code for the last layer to be able to see it in the output? And I’m not sure, if it’s not tracked does it mean that their gradients are not computed? Thank you.

ksmdanl · October 7, 2022, 1:27pm

Hi, depends from what kind of instance or object input_layer and output_layer are, they are not listed or registered in your nn.Module.

In your code snippet, I assume that self.output_layer is just some integer and not a layer, which doesn’t it part of the nn.Module, hence not listed.

Otherwise, if by output layers you mean the self.pre_out, it is not listed in your module either. By initializing self.pre_out = nn.ModuleList(), it puts the layers inside self.pre_out into the nn.Module.

Suleymanzade · October 7, 2022, 1:35pm

Yes, input_layer and output_layer are integers that represent the number of input features and output targets. When I wrote output layers I mean the nn.Linear layers that are inside the self.pre_out list. Later, I just calculate the out parameters with respect to each output: How to add them into the autograd by nn.ModuleList()?

ksmdanl · October 7, 2022, 1:45pm

Instead of initializing self.pre_out with a normal list, initialize it with nn.ModuleList().
But I’m not sure I understand what you mean by adding the output to the autograd.

Do you want to make sure that your outputs are taken as leaf tensors? In that case, they should already be leaf tensors according to your forward() call.

Or do you want to make sure that your trainable parameters are listed?

Suleymanzade · October 7, 2022, 1:52pm

Yes, I want the outputs to be taken as leaf tensors. It’s like the code of the model from this website Deep Learning Architectures for Multi-Label Classification using PyTorch - DebuggerCafe but I don’t want hardcode 23 output layers - instead, I used the list, but the last output layers are not tracked.

ksmdanl · October 7, 2022, 1:53pm

I missed your second question on your post, sorry:)
The parameters are updated if they are leaf tensors. In your case out is a result of something in your forward() call, so to my understanding they should be leaf tensors and tracked inside the graph

Suleymanzade · October 7, 2022, 1:59pm

Yes, exactly. And I don’t know, how to add them into the common architecture? It’s like multichannel part, but only for the output. Imagine that you have the code from this website, Deep Learning Architectures for Multi-Label Classification using PyTorch - DebuggerCafe how to change it in such way to use it for any number of the outputs?

ksmdanl · October 7, 2022, 2:04pm

It’s exactly how you did it with one change. As I mentioned before, if you replace self.pre_out = [] with self.pre_out = nn.ModuleList(), you will be able to see the layers.

Example for model = MultiHeadBinaryModel(1, 1)

MultiHeadBinaryModel(
  (fc1): Linear(in_features=1, out_features=290, bias=True)
  (bn1): BatchNorm1d(290, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (fc2): Linear(in_features=290, out_features=320, bias=True)
  (fc3): Linear(in_features=320, out_features=370, bias=True)
  (fc4): Linear(in_features=370, out_features=420, bias=True)
  (dropout): Dropout(p=0.15, inplace=False)
  (pre_out): ModuleList(
    (0): Linear(in_features=420, out_features=1, bias=True)
  )
)

Suleymanzade · October 7, 2022, 2:10pm

Thank you, I will check it )

Suleymanzade · October 10, 2022, 5:57am

Thank you very much. Now, it works