I was trying to re-implement MobileNet-V2 according to the paper. Based on Table 2(Page 5) of the paper, after the sequence of Inverted residual blocks, there should be 2 other regular convolutional layers. But in Pytorch’s implementation, the second regular convolutional layer(last row in table 2) has been removed. What is the reason for this modification?
I am not sure, but it seems that everything is correct. Last row: conv2d 1x1
is just a classifier. You can use the convolution layer with filter 1x1 instead of the Linear layer. The output will be the same - for a mathematical point of view, they are the same operations. I suppose that authors use that kind of format in the table to keep the same conventions - without introducing a new type of layer.
See example:
number_of_classes = 1000
batch_size = 512
i = torch.rand(batch_size, 1280, 7, 7)
class NetworkA(torch.nn.Module):
def __init__(self):
super(NetworkA, self).__init__()
self.linear = torch.nn.Linear(1280, number_of_classes)
def forward(self, x):
x = torch.nn.functional.adaptive_avg_pool2d(x, (1, 1)).reshape(x.shape[0], -1)
return self.linear(x)
class NetworkB(torch.nn.Module):
def __init__(self):
super(NetworkB, self).__init__()
self.conv = torch.nn.Conv2d(1280, number_of_classes, kernel_size=1)
def forward(self, x):
x = torch.nn.functional.adaptive_avg_pool2d(x, (1, 1))
return self.conv(x).reshape(x.shape[0], -1)
netA = NetworkA()
print(netA(i).shape)
netB = NetworkB()
print(netB(i).shape)
The result:
torch.Size([512, 1000])
torch.Size([512, 1000])