Infer the size of the output of an intermediate layer considering the input size

Hello, is there any efficient way to determine the size of the output of an intermediate layer considering the input size ? For an instance, in the following code snippet the output of conv_3 has the size of N x 256 x 8 x 8 for for an input of size N x 1 x 64 x 64 . So the in_features of fc1 will be 256 x 8 x 8 . Instead of manually calculating this value for any particular input dimension, is there any way to automate this? Thank you for any help.

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 64, 3, padding=1)
        self.conv1_ = nn.Conv2d(64, 64, 3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(64, 128, 3, padding=1)
        self.conv2_ = nn.Conv2d(128, 128, 3, padding=1)
        self.conv3 = nn.Conv2d(128, 256, 3, padding=1)
        self.conv3_ = nn.Conv2d(256, 256, 3, padding=1)
        self.flat = nn.Flatten()
        self.fc1 = nn.Linear(256 * 8 * 8, 15)           

    def forward(self, x):         
        x = (F.leaky_relu(self.conv1(x)))          
        x = self.pool(F.leaky_relu(self.conv1_(x)))  
        x = (F.leaky_relu(self.conv2(x)))           
        x = self.pool(F.leaky_relu(self.conv2_(x)))           
        x = (F.leaky_relu(self.conv3(x)))           
        x = self.pool(F.leaky_relu(self.conv3_(x)))        
        x = self.flat(x)      
        x = (self.fc1(x))        
        return x

You can try sending a test tensor through the encoding conv / pooling layers before declaring your linear layer, and check its output shape, then use that shape for calculating the flat size of the linear layer.

For example, it’s done like that here: https://github.com/yjlolo/vae-audio/blob/3a43e3122da55daf89fffaf5bbf11fdb805a597f/model/model.py#L180

1 Like

Hi, that sounds like a neat idea to handle this issue. Thank you for the suggestion.