How is it possible to get the output size of `n` Consecutive Convolutional layers?

anubhav4sachan · June 29, 2020, 11:45am

Given network architecture, what are the possible ways to define fully connected layer fc1 to have a generalized structure such as nn.Linear($size_of_previous_layer$, 50)?

The main issue arising is due to x = F.relu(self.fc1(x)) in the forward function. After using the flatten, I need to incorporate numerous dense layers. But to my understanding, self.fc1 must be initialized and hence, needs a size (to be calculated from previous layers). How can I declare the self.fc1 layer in a generalized manner?

My Thought:
To get the size, I can calculate the size of the outputs from each of Convolution layer, and since I have just 3, it is feasible. But, in case of n layers, how can you get the output size from the final convolutional layer?

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()

        self.conv1 = nn.Conv2d(3, 10, kernel_size=3, padding = 1)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=3, padding = 1)
        self.conv2_drop = nn.Dropout2d(0.4)
        self.conv3 = nn.Conv2d(20, 40, kernel_size=3, padding = 1)
        self.conv3_drop = nn.Dropout2d(0.4)

        self.fc1 = nn.Linear(360, 50)  # self.fc1 = nn.Linear($size_of_previous_layer$, 50)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = F.relu(F.max_pool2d(self.conv3_drop(self.conv3(x)), 2))

        x = x.flatten(1)

        x = F.relu(self.fc1(x))
        return F.log_softmax(x)

Input to the following architecture can assumed to be [3, 32, 32] (num_of_channels, height, width).

@ptrblck, could you help me?

PS:

For single convolutional layer, it is quite easy. The question refers, if you have n convolutional layers.

lfolle · June 29, 2020, 4:26pm

You could put the kernel sizes that will be used to initialize the Conv layers in a list.
Then you could write a small function that calculates the output size given the list and the input size. The number of channels is given by the last Conv layers num_features.

anubhav4sachan · June 29, 2020, 5:23pm

@lfolle That will be good for a fixed number of convolution layers and I, ultimately, will have to calculate for every layer which is not required.

The code needs to be robust and hence, need a general solution for the problem which can be applied to any number of convolutional layers.

aponcedeleonch · June 30, 2020, 9:54am

Maybe I am missing something here but wouldn’t @lfolle solution would work for n convolutional layers? You put the n kernel sizes, the n padding sizes, n dilations and n strides in lists and then calculate in a function what would be the output size after the n layers. Then you can just call that function at __init__

def __init__(self):
    ....
    out_h, out_w = self.calc_out_conv_layers(in_h, in_w, kernels,
                                             paddings, dilations, strides)
    self.fc1 = nn.Linear(out_h*out_w*out_c, 50)

anubhav4sachan · June 30, 2020, 1:04pm

Can you show me the contents of the function calc_out_conv_layers?

To my understanding, the function is calculating the output size from an immediate previously stated convolutional layer. This means, the function self.calc_out_conv_layers has to be called upon whenever I’m declaring a convolutional layer. This is not what I want.

In addition to this, the forward function has x = F.relu(F.max_pool2d(self.conv1(x), 2)), hence, the calc_out_conv_layers function needs to adapted manually according to the case if or not I’m using Pooling or not.

aponcedeleonch · June 30, 2020, 1:51pm

I haven’t tested but I think it can be something like this:

def calc_out_conv_layers(self, in_h, in_w, kernels, paddings, dilations, strides):
    out_h = in_h
    out_w = in_w
    for ker, pad, dil, stri in zip(kernels, paddings, dilations, strides):
        out_h = (out_h + 2*pad - dil * (ker-1) - 1)/stri + 1
        out_w = (out_w + 2*pad - dil * (ker-1) - 1)/stri + 1

    return out_h, out_w

You would call this function only when you have finished adding all the kernels, paddings, ... to the appropriate list. But yes, you would need to modify this function to also take into account Pooling or not. Probably something more robust is instead of having calls to F.max_pool2d in the forward function you can add the pooling as modules at __init__ with nn.MaxPool2d. Then you can have a piece of code that iterates over all the registered modules and make the necessary calculations.

for m in self.modules():
    if type(m) is nn.Conv2d:
        // do some calculations 
    elif type(m) is nn.MaxPool2d:
        // do more calculations

I think what you are looking for is a function that does not exist. There’s no function that tells you automatically what would be the last shape of the last layer or of a sequence of convolutional layers. At least not one that I know of. You always have to make the calculations at the side.

anubhav4sachan · June 30, 2020, 4:44pm

Looks good. This was my first thought for doing the calculation but I anticipated that there might a better dynamic way to do it, but now, I reckon, I have to continue treating the function as a layer as suggested.

I am, still, in a doubt and would love to ask you if there is a mathematical/conceptual way to do this (quoted) or we just have to calculate the output size layer by layer?

And yes, thanks for the suggestion.

aponcedeleonch · June 30, 2020, 8:09pm

As far as I know the mathematical/conceptual way of doing it is layer by layer. This is because different input image sizes will have different output shape i.e. the output shape will be different for an input of size (3, 128, 128) than for an input size of (3, 1024, 1024). There is no generalization because you will always have the variable of the input size. But if you find out a way I would also like to know it

lfolle · July 1, 2020, 6:45am

@aponcedeleonch solution is probably the best way to do this.
As another approach you could also try to input an example tensor into the network and print the shape of the intermediate tensor before passing it to the fully-connected layer.

Egor_Kraev · July 1, 2020, 1:56pm

My perhaps inefficient but general (and correct by construction:) ) way of doing it is to start with an example input, and when building up the model in __init__() feed that input through each layer in turn, get the size of the output, use that to initialize the next layer, and repeat until done

anubhav4sachan · July 2, 2020, 1:50am

Nah, that would be way too much manual work and quite infeasible.

anubhav4sachan · July 2, 2020, 1:53am

Yes, this was the question, to get the output in a generalized manner, and @aponcedeleonch 's solution is the best known method right now i.e. to calculate layer by layer.

MR_Rational · February 23, 2021, 10:30am

Can a pytorch insider please advise why the function is not built into the torch.nn layer classes ? or at least, implement it in a container, such as sequential? In tensorfkow, you only need specify the input shape of first layer, then TF automatically figures out the output shape of each layer and pass the information to the input of next layer. This feature is critical to train a complex network, as otherwise you would have to check/update each layer if you change feature vector dimension. thanks