Mismatch of output dimensions and whats reported from model summary

eyeris · October 14, 2020, 3:00pm

I am struggeling to determine a cause for following behavior of 3 class classification code I have below. The classifier takes a 2D matrix of torch.Size([1, 1, 512, 512]) . The output is torch.Size([256, 3]) .

class Classifier(nn.Module):
    def __init__(self):
        super(Classifier, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 32, kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        self.layer2 = nn.Sequential(
            nn.Conv2d(32, 64, kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        self.drop_out = nn.Dropout()
        self.fc1 = nn.Linear(4096, 1024)
        self.fc2 = nn.Linear(1024, 3)
        
        
    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = out.view(-1, 4096)
        out = self.drop_out(out)
        out = self.fc1(out)
        out = self.fc2(out)
        return out

The problem is same model provides following dimensions when model summary is printed. My problem is I need to compare this with a torch.Size([1]) where the class labels are stored. Everything is showing that this is designed for 3 class classification problem but cannot determine where this value of 256 in the output tensor comes from.

summary(activity_recognizer,(1,512,512))

==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
├─Sequential: 1-1                        [-1, 32, 256, 256]        --
|    └─Conv2d: 2-1                       [-1, 32, 512, 512]        832
|    └─ReLU: 2-2                         [-1, 32, 512, 512]        --
|    └─MaxPool2d: 2-3                    [-1, 32, 256, 256]        --
├─Sequential: 1-2                        [-1, 64, 128, 128]        --
|    └─Conv2d: 2-4                       [-1, 64, 256, 256]        51,264
|    └─ReLU: 2-5                         [-1, 64, 256, 256]        --
|    └─MaxPool2d: 2-6                    [-1, 64, 128, 128]        --
├─Dropout: 1-3                           [-1, 4096]                --
├─Linear: 1-4                            [-1, 1024]                4,195,328
├─Linear: 1-5                            [-1, 3]                   3,075
==========================================================================================
Total params: 4,250,499
Trainable params: 4,250,499
Non-trainable params: 0
Total mult-adds (G): 3.57
==========================================================================================
Input size (MB): 1.00
Forward/backward pass size (MB): 96.01
Params size (MB): 16.21
Estimated Total Size (MB): 113.22
==========================================================================================

Spend few hours but seems like I hit a deadend. Appreciate any help I can have here.

ptrblck · October 15, 2020, 8:53am

The view operation is wrong and will change the batch size.
Use out = out.view(out.size(0), -1) instead and set the in_features of self.fc1 to 128*128*64.

eyeris · October 19, 2020, 1:01am

HI ptrblck, Thanks for he solution. This worked perfectly.