How to make `torch.max` seach only few columns in the output of the model?

palguna_gopireddy · August 29, 2023, 1:11pm

I have designed a model with output nodes (400 in my case) not equal to the number of classes (5 in my case). When I train this model, only first 0 to 4 columns in the output are non-zero, remaining 5 to 400 column values are zero.
So when I used _, pred_labels = torch.max(y_predict, 1), it is producing predicted labels in between 0 to 4.
That is how I am able to get the accuracy of 97%.

How can I make torch.max seach only 0 to 4 columns of each row of the model output. If so, Does this valid as correct way of training?

ptrblck · August 29, 2023, 3:09pm

You can slice the output via y_predict[:, :5] to use only the first 5 columns.

palguna_gopireddy · August 30, 2023, 3:19am

Thank you. Is this valid as correct way of training, if I put y_predict[:,:5] in the training after y_predict = model(images.to(device)) in the training stage?

ptrblck · August 30, 2023, 3:20am

Yes, you would still waste compute to calculate the unused logits, but I don’t see other issues using this approach.

palguna_gopireddy · August 30, 2023, 3:32am

Thank you. I have tried putting output nodes of the model as 5 instead of 400. But It has effected the results. So if I want to reduce computation I think I have to change the whole architecture, which is again a burden. So I will slice the y_predict, as training stage is not violating anything.

Just one more information asking to gain knowledge, Does Pytorch have any torch.nn module or other torch module which gives this slicing or otherway to get this first 5 columns, so that I can put it in the module architecture itself?"

ptrblck · August 30, 2023, 3:37am

I’m not sure why this should be the case as the unused parameters won’t receive any gradients as they were not used as seen in this small example:

lin = nn.Linear(10, 10, bias=False)
x = torch.randn(1, 10)
out = lin(x)

out[:, :5].mean().backward()
print(lin.weight.grad)
# tensor([[ 0.0022, -0.0048, -0.0104, -0.1546,  0.1455,  0.1668,  0.0132,  0.2682,
#           0.0032, -0.0224],
#         [ 0.0022, -0.0048, -0.0104, -0.1546,  0.1455,  0.1668,  0.0132,  0.2682,
#           0.0032, -0.0224],
#         [ 0.0022, -0.0048, -0.0104, -0.1546,  0.1455,  0.1668,  0.0132,  0.2682,
#           0.0032, -0.0224],
#         [ 0.0022, -0.0048, -0.0104, -0.1546,  0.1455,  0.1668,  0.0132,  0.2682,
#           0.0032, -0.0224],
#         [ 0.0022, -0.0048, -0.0104, -0.1546,  0.1455,  0.1668,  0.0132,  0.2682,
#           0.0032, -0.0224],
#         [ 0.0000, -0.0000, -0.0000, -0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
#           0.0000, -0.0000],
#         [ 0.0000, -0.0000, -0.0000, -0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
#           0.0000, -0.0000],
#         [ 0.0000, -0.0000, -0.0000, -0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
#           0.0000, -0.0000],
#         [ 0.0000, -0.0000, -0.0000, -0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
#           0.0000, -0.0000],
#         [ 0.0000, -0.0000, -0.0000, -0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
#           0.0000, -0.0000]])

You could write a custom nn.Module as seen here:

class Slice(nn.Module):
    def __init__(self, index):
        super().__init__()
        self.index = index
        
    def forward(self, x):
        return x[:, :self.index]
    
model = nn.Sequential(
    nn.Linear(10, 10),
    Slice(5),
    nn.Linear(5, 20)
)

x = torch.randn(1, 10)
out = model(x)
print(out.shape)
# torch.Size([1, 20])

palguna_gopireddy · August 30, 2023, 4:03am

May be it’s because the last stage of the model is AdaptiveAvgPooling(1),then Dropout and Flattening layers. I have tried changing from 400 to 5 before this last stage, as I thought it is where I am producing extra 395.
Can AdaptiveAvgPooling reduce incoming 400 to 5 channels (with 1*1 feature maps ) or is there other pooling technique?

ptrblck · August 30, 2023, 3:56pm

Your currently used pooling layer seems to reduce the spatial size of the activation to 1, so I assume the 400 values are in the channel dimension? If so, you could use e.g. an nn.Linear layer to reduce the number of activations to 5. Alternatively you could also use an nn.Conv2d layer with a kernel size of 1x1, but this would be equivalent to the linear layer.

palguna_gopireddy · August 31, 2023, 4:53am

Yes, 400 values are in the channel dimension. Thank you.