Where are these numbers coming from?

Hi all,

I’m trying to work with and modify VGG for a different task, but I’m struggling to figure out where these values are coming from and how I can determine the ones that I need when I make modifications to the network layers.

class VGG(nn.Module):

    def __init__(self, features, num_classes=1000):
        super(VGG, self).__init__()
        self.features = features
        self.classifier = nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096),
            nn.Linear(4096, 4096),
            nn.Linear(4096, num_classes),

    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), -1)
        x = self.classifier(x)
return x

What is “nn.Linear(512 * 7 * 7, 4096)”? Where do we get these values from?

I’m attempting to make VGG an encoder-docoder network, and so I am trying to learn how to properly use ConvTranspose2d as a deconvolutional layer. I understand that the Linear layer wants to accept 512 as the input value, but I do not understand 512 * 7 * 7 or where 4096 is coming from. Any changes (like saying “512 * 5 * 5”) for example) predictably give me a size mismatch error such as

Traceback (most recent call last):
  File "smallEDNetwork2.py", line 218, in <module>
  File "smallEDNetwork2.py", line 165, in train_model
    outputs = model(inputs)
  File "/home/nic/.conda/envs/my_root/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "smallEDNetwork2.py", line 75, in forward
    x = self.classifier(x)
  File "/home/nic/.conda/envs/my_root/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/nic/.conda/envs/my_root/lib/python3.6/site-packages/torch/nn/modules/container.py", line 67, in forward
    input = module(input)
  File "/home/nic/.conda/envs/my_root/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/nic/.conda/envs/my_root/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 55, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/nic/.conda/envs/my_root/lib/python3.6/site-packages/torch/nn/functional.py", line 835, in linear
    return torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch at /opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/generic/THCTensorMathBlas.cu:243

Thanks for your help!

51277 comes from the very last convolutional layer. 512 is the number of channels in its output and 7x7 is the dimension. You cannot change these values without changing, either the input image size so or parameters of another layer (e.g., stride size) so that the output of last convolutional layer is 512x5x5.


Note that in forward method, the variable is first passed to features and then classifier, so the input dimension to classifier is the output dimension of features (x.view() is used to flatten dimension. Why is 512 x 7 x 7? That’s because VGG conv layers’ final output size is 512 channel with 7x7.


Great, that answers my question perfectly. Thanks guys.