VGG with 1x1 convolution

Hi,

given the first 5 layers of a VGG network:

Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU(inplace=True)
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(6): ReLU(inplace=True)
(7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(8): ReLU(inplace=True)
(9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): ReLU(inplace=True)
(14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(15): ReLU(inplace=True)
(16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(18): ReLU(inplace=True)
(19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(20): ReLU(inplace=True)
(21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(22): ReLU(inplace=True)
(23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(25): ReLU(inplace=True)
(26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(27): ReLU(inplace=True)
(28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(29): ReLU(inplace=True)
(30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)

How to add a layer to classify each activation into n_classes? I am trying with a 1x1 convolution from 512 (number of filters in last layer) to n_classes, something like this:

nn.Sequential(
nn.Conv2d(in_channels=512, out_channels=n_classes, kernel_size=1),
nn.Softmax(dim=1)
)
This network gives (batch_size, n_classes H // 32, W // 32) outputs. However, all outputs for each class are equal for a given image, I mean that, for a given batch, B and class C, all values are the same.

I have tried training with CrossEntropyLoss and even converting to logits and applying BCELoss, but without success. I also have tried other networks in my train loop and this do not happens.

Any clue?

Thanks

You can use nn.AdaptiveAvgPool2d to reduce the spatial dimensions to 1x1(HxW) and then use 1x1 convolution to bring num_filters = num_classes. Use this after the last conv output from vgg.

self.classifier = nn.Sequential(
          nn.AdaptiveAvgPool2d((1, 1)), 
          nn.Conv2d(in_channels=512, out_channels=n_classes, kernel_size=1),
          nn.ReLU(inplace=True)
         )

Given what you have done above, I am not sure how you were able to calculate loss since the output will be num_classes, H//32, W//32. However one important thing to note is, do not use nn.Softmax at the end of the network when using nn.CrossEntropy(it combines softmax and nll loss). That can cause problems.

Hi,

thanks for the tip, I have removed Softmax from classification layer. Also, I found my mistake: it was a problem of weight initialization of last 1x1 convolution layer. Now the network is learning properly.

many thanks