I was trying to re-implement MobileNet-V2 according to the paper. Based on Table 2(Page 5) of the paper, after the sequence of Inverted residual blocks, there should be 2 other regular convolutional layers. But in Pytorch’s implementation, the second regular convolutional layer(last row in table 2) has been removed. What is the reason for this modification?
I am not sure, but it seems that everything is correct. Last row:
conv2d 1x1 is just a classifier. You can use the convolution layer with filter 1x1 instead of the Linear layer. The output will be the same - for a mathematical point of view, they are the same operations. I suppose that authors use that kind of format in the table to keep the same conventions - without introducing a new type of layer.
number_of_classes = 1000 batch_size = 512 i = torch.rand(batch_size, 1280, 7, 7) class NetworkA(torch.nn.Module): def __init__(self): super(NetworkA, self).__init__() self.linear = torch.nn.Linear(1280, number_of_classes) def forward(self, x): x = torch.nn.functional.adaptive_avg_pool2d(x, (1, 1)).reshape(x.shape, -1) return self.linear(x) class NetworkB(torch.nn.Module): def __init__(self): super(NetworkB, self).__init__() self.conv = torch.nn.Conv2d(1280, number_of_classes, kernel_size=1) def forward(self, x): x = torch.nn.functional.adaptive_avg_pool2d(x, (1, 1)) return self.conv(x).reshape(x.shape, -1) netA = NetworkA() print(netA(i).shape) netB = NetworkB() print(netB(i).shape)
torch.Size([512, 1000]) torch.Size([512, 1000])