VGG with 1x1 convolution

ricardborras · October 11, 2019, 7:14pm

Hi,

given the first 5 layers of a VGG network:

Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU(inplace=True)
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(6): ReLU(inplace=True)
(7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(8): ReLU(inplace=True)
(9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): ReLU(inplace=True)
(14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(15): ReLU(inplace=True)
(16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(18): ReLU(inplace=True)
(19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(20): ReLU(inplace=True)
(21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(22): ReLU(inplace=True)
(23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(25): ReLU(inplace=True)
(26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(27): ReLU(inplace=True)
(28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(29): ReLU(inplace=True)
(30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)

How to add a layer to classify each activation into n_classes? I am trying with a 1x1 convolution from 512 (number of filters in last layer) to n_classes, something like this:

nn.Sequential(
nn.Conv2d(in_channels=512, out_channels=n_classes, kernel_size=1),
nn.Softmax(dim=1)
)
This network gives (batch_size, n_classes H // 32, W // 32) outputs. However, all outputs for each class are equal for a given image, I mean that, for a given batch, B and class C, all values are the same.

I have tried training with CrossEntropyLoss and even converting to logits and applying BCELoss, but without success. I also have tried other networks in my train loop and this do not happens.

Any clue?

Thanks

mailcorahul · October 12, 2019, 12:16pm

You can use nn.AdaptiveAvgPool2d to reduce the spatial dimensions to 1x1(HxW) and then use 1x1 convolution to bring num_filters = num_classes. Use this after the last conv output from vgg.

self.classifier = nn.Sequential(
          nn.AdaptiveAvgPool2d((1, 1)), 
          nn.Conv2d(in_channels=512, out_channels=n_classes, kernel_size=1),
          nn.ReLU(inplace=True)
         )

Given what you have done above, I am not sure how you were able to calculate loss since the output will be num_classes, H//32, W//32. However one important thing to note is, do not use nn.Softmax at the end of the network when using nn.CrossEntropy(it combines softmax and nll loss). That can cause problems.

ricardborras · October 12, 2019, 7:33pm

Hi,

thanks for the tip, I have removed Softmax from classification layer. Also, I found my mistake: it was a problem of weight initialization of last 1x1 convolution layer. Now the network is learning properly.

many thanks