How to keep the shape of input and output same when dilation conv?

in keras, if the padding is set “same”, then the the shape of input and output will be same.
for example, in keras, if the input is 32
model.add(Conv2D(256, kernel_size=3, strides=1,
padding=‘same’, dilation_rate=(2, 2)))
the output shape will not change.
but in pytorch,
nn.Conv2d(256,256,3,1,1, dilation=2,bias=False),
the output shape will become 30.
so how to keep the shape of input and output same when dilation conv?

1 Like

You could visualize it with some tools like ezyang’s convolution visualizer or calculate it with this formula:

  • o = output
  • p = padding
  • k = kernel_size
  • s = stride
  • d = dilation
o = [i + 2*p - k - (k-1)*(d-1)]/s + 1

In your case this gives o = [32 + 2 - 3 - 2*1]/1 +1 = [29] + 1 = 30.
Now, you could set all your parameters and “solve” the equation for p.
You will see, that p=2 will give you an output size of 32.

11 Likes

thank you, i have just tried and solved my problem in terms of your answer.
another question, what is the similar formula for ConvTranspose2d?

The output size of a transposed convolution is given by:

o = (i -1)*s - 2*p + k + output_padding 

Note that ConvTranspose layers come with an output_padding parameter, which defaults to 0.
The formulas are also shown in the documentation of PyTorch’s convolution layers.
(I used a slightly different notation for the Conv layer output. My formula can be simplified to the one shown in the docu though.)

3 Likes

nice, thank you very much! :hugs:

Might be a silly question, but I am finding it hard to visualize the above formula for 3d images.
I am working on 3d U-Net model for passing oct scans in the model and want to keep the volume same after applying 3d Conv. What modification should I make to the above formula?

If my kernel size is not square i.e suppose I use 3x3x1 and then 1x1x3 conv3D on my scans, how will I calculate padding separately for all three dimensions?

Sorry for digging this up again, but as the formula depends on the input size of the convolution layer I’m not sure how to create a dilated convolution layer that will preserve arbitrary input dimensions. Is that even possible? (It seems to be in Keras).

@Time0o Yes, its definitely possible with the help of padding.
To preserve input dimension we obviously need to use stride = 1
bcz if we use stride > 1, input image size will be almost half.

Next, in the above case lets say i am using dilation = 2, kernel size 3, padding = 2
0 = [256 + 4 - 3 - 2]/1 + 1 = 256
We are able to preserve the input dimension even after dilated convolution

A somewhat related, but different question:
In a u-net architecture, I am using conv2d and convtransposed2d for the down and up path. I wish to build a class that can take in arbitrary input image size. However, for many input image sizes the layers at the same depth of the down and up paths can have slightly different image size due to the way the implementations of conv2d and convtransposed2d are made. Is there a way to dynamically calculate the padding needed to make the down and up paths at the same depth have the same image size?

Your stride is 2 or changing during upsampling and downsampling?

what if the input has non-equal height and width.