Utility function for calculating the shape of a conv output

Hello!
Is there some utility function hidden somewhere for calculating the shape of the output tensor that would result from passing a given input tensor to (for example), a nn.Conv2d module?

To me this seems basic though, so I may be misunderstanding something about how pytorch is supposed to be used.

Use case:
You have a (non-convolutional) custom module that needs to know the shape of its input in order to define its nn.Parameters. I realize fully-convolutional architectures do not have this need, but I have a module that does. Your code is built so that the Variable passed to this module is the result of applying a Conv2d module to a tensor of known (constant) size. Given all the arguments to the Conv2d module (stride, kernel size, etc), the size of the output can be computed, but to my knowledge this function isn’t in pytorch.

Is this function just not in pytorch? (In which case I can implement it, for personal use, or submission) Or is my question based on a false premise?

3 Likes

Conv2d — PyTorch master documentation describes the output size.

3 Likes

I imagine if you wanted a function you could generate a random input, pass it to conv2d, and check the size of the output (this is definitely not efficient but works as a sanity check)

2 Likes

Sure, so it’s not implemented yet then. I almost feel like it would be nice if this was just a method of the conv module: pass an input shape, and it returns the formula above.

2 Likes

in case anyone else end’s up here…

"""
Utility function for computing output of convolutions
takes a tuple of (h,w) and returns a tuple of (h,w)
"""
def conv_output_shape(h_w, kernel_size=1, stride=1, pad=0, dilation=1):
    from math import floor
    if type(kernel_size) is not tuple:
        kernel_size = (kernel_size, kernel_size)
    h = floor( ((h_w[0] + (2 * pad) - ( dilation * (kernel_size[0] - 1) ) - 1 )/ stride) + 1)
    w = floor( ((h_w[1] + (2 * pad) - ( dilation * (kernel_size[1] - 1) ) - 1 )/ stride) + 1)
    return h, w
21 Likes

Thank you very much for the code snippet!

I adapted it a little bit including a version for transposed convolutions:

def conv_output_shape(h_w, kernel_size=1, stride=1, pad=0, dilation=1):
    """
    Utility function for computing output of convolutions
    takes a tuple of (h,w) and returns a tuple of (h,w)
    """
    
    if type(h_w) is not tuple:
        h_w = (h_w, h_w)
    
    if type(kernel_size) is not tuple:
        kernel_size = (kernel_size, kernel_size)
    
    if type(stride) is not tuple:
        stride = (stride, stride)
    
    if type(pad) is not tuple:
        pad = (pad, pad)
    
    h = (h_w[0] + (2 * pad[0]) - (dilation * (kernel_size[0] - 1)) - 1)// stride[0] + 1
    w = (h_w[1] + (2 * pad[1]) - (dilation * (kernel_size[1] - 1)) - 1)// stride[1] + 1
    
    return h, w
def convtransp_output_shape(h_w, kernel_size=1, stride=1, pad=0, dilation=1):
    """
    Utility function for computing output of transposed convolutions
    takes a tuple of (h,w) and returns a tuple of (h,w)
    """
    
    if type(h_w) is not tuple:
        h_w = (h_w, h_w)
    
    if type(kernel_size) is not tuple:
        kernel_size = (kernel_size, kernel_size)
    
    if type(stride) is not tuple:
        stride = (stride, stride)
    
    if type(pad) is not tuple:
        pad = (pad, pad)
        
    h = (h_w[0] - 1) * stride[0] - 2 * pad[0] + kernel_size[0] + pad[0]
    w = (h_w[1] - 1) * stride[1] - 2 * pad[1] + kernel_size[1] + pad[1]
    
    return h, w
5 Likes

I went ahead and decided to improve on @MicPie’s solution little bit. Specifically, it now accepts 4-tuple in form of ((pad_up, pad_bottom), (pad_left, pad_right)) for padding to allow for asymmetric padding. The transposed convolution function supports dilation as well as output padding. Also added additional functions to compute padding for desired output shape.

Below is full code. Note that currently PytTorch Conv2D layer accepts only 2-tuple so it only allows symmetric padding. To use asymmetric padding, currently you must additional padding layer either usin F.pad or ZeroPadding layer. See here for more discussions.

import math

def num2tuple(num):
    return num if isinstance(num, tuple) else (num, num)

def conv2d_output_shape(h_w, kernel_size=1, stride=1, pad=0, dilation=1):
    h_w, kernel_size, stride, pad, dilation = num2tuple(h_w), \
        num2tuple(kernel_size), num2tuple(stride), num2tuple(pad), num2tuple(dilation)
    pad = num2tuple(pad[0]), num2tuple(pad[1])
    
    h = math.floor((h_w[0] + sum(pad[0]) - dilation[0]*(kernel_size[0]-1) - 1) / stride[0] + 1)
    w = math.floor((h_w[1] + sum(pad[1]) - dilation[1]*(kernel_size[1]-1) - 1) / stride[1] + 1)
    
    return h, w

def convtransp2d_output_shape(h_w, kernel_size=1, stride=1, pad=0, dilation=1, out_pad=0):
    h_w, kernel_size, stride, pad, dilation, out_pad = num2tuple(h_w), \
        num2tuple(kernel_size), num2tuple(stride), num2tuple(pad), num2tuple(dilation), num2tuple(out_pad)
    pad = num2tuple(pad[0]), num2tuple(pad[1])
    
    h = (h_w[0] - 1)*stride[0] - sum(pad[0]) + dialation[0]*(kernel_size[0]-1) + out_pad[0] + 1
    w = (h_w[1] - 1)*stride[1] - sum(pad[1]) + dialation[1]*(kernel_size[1]-1) + out_pad[1] + 1
    
    return h, w

def conv2d_get_padding(h_w_in, h_w_out, kernel_size=1, stride=1, dilation=1):
    h_w_in, h_w_out, kernel_size, stride, dilation = num2tuple(h_w_in), num2tuple(h_w_out), \
        num2tuple(kernel_size), num2tuple(stride), num2tuple(dilation)
    
    p_h = ((h_w_out[0] - 1)*stride[0] - h_w_in[0] + dilation[0]*(kernel_size[0]-1) + 1)
    p_w = ((h_w_out[1] - 1)*stride[1] - h_w_in[1] + dilation[1]*(kernel_size[1]-1) + 1)
    
    return (math.floor(p_h/2), math.ceil(p_h/2)), (math.floor(p_w/2), math.ceil(p_w/2))

def convtransp2d_get_padding(h_w_in, h_w_out, kernel_size=1, stride=1, dilation=1, out_pad=0):
    h_w_in, h_w_out, kernel_size, stride, dilation, out_pad = num2tuple(h_w_in), num2tuple(h_w_out), \
        num2tuple(kernel_size), num2tuple(stride), num2tuple(dilation), num2tuple(out_pad)
        
    p_h = -(h_w_out[0] - 1 - out_pad[0] - dialation[0]*(kernel_size[0]-1) - (h_w[0] - 1)*stride[0]) / 2
    p_w = -(h_w_out[1] - 1 - out_pad[1] - dialation[1]*(kernel_size[1]-1) - (h_w[1] - 1)*stride[1]) / 2
    
    return (math.floor(p_h/2), math.ceil(p_h/2)), (math.floor(p_w/2), math.ceil(p_w/2))
10 Likes

Hi Guys, I recently jump on to the pytorch and also noticed the problem. Automatically working out the output shape is not only the requirement for Conv1d(2d). In tensorflow, once the shape of the first input layer is known, TF can figure out output shape of following layers automatically. Now it is 2021, just wondering if we have something in Pytorch too ? thanks

5 Likes

I agree! Having to update all shapes when you change a single layer is a major annoyance. Just moments ago I was trying to write code for a conv autoencoder in a more general way. My solution was to compute the output shape for a random tensor, inside the model initialization method.

1 Like

@Joao_Pedro Would you mind sharing it?

I ended up using the “Lazy” modules, so PyTorch infers the tensor shapes when you first execute the forward method, just like Keras.

@Joao_Pedro Are there implementation of lazy modules in PyTorch, or did you implement the laziness yourself? I have recently implemented a lazy module myself, so I was wondering whether PyTorch maybe had support for this already, and in that case I should probably switch to using that.

As the title to this thread suggests, it would be nice if PyTorch could provide a utility function for calculating the output shape of a _ConvNd layer given an input shape. Since PyTorch already must do this internally when _ConvNd.forward is called, I don’t think this should be too much of an issue to add to the API, so is this something that could be implemented? :upside_down_face:

Lazy modules are implemented as described in your other post.