How to keep the shape of input and output same when dilation conv?

micklexqg · March 4, 2018, 12:30pm

in keras, if the padding is set “same”, then the the shape of input and output will be same.
for example, in keras, if the input is 32
model.add(Conv2D(256, kernel_size=3, strides=1,
padding=‘same’, dilation_rate=(2, 2)))
the output shape will not change.
but in pytorch,
nn.Conv2d(256,256,3,1,1, dilation=2,bias=False),
the output shape will become 30.
so how to keep the shape of input and output same when dilation conv?

ptrblck · March 4, 2018, 12:49pm

You could visualize it with some tools like ezyang’s convolution visualizer or calculate it with this formula:

o = output
p = padding
k = kernel_size
s = stride
d = dilation

o = [i + 2*p - k - (k-1)*(d-1)]/s + 1

In your case this gives o = [32 + 2 - 3 - 2*1]/1 +1 = [29] + 1 = 30.
Now, you could set all your parameters and “solve” the equation for p.
You will see, that p=2 will give you an output size of 32.

micklexqg · March 4, 2018, 1:04pm

thank you, i have just tried and solved my problem in terms of your answer.
another question, what is the similar formula for ConvTranspose2d?

ptrblck · March 4, 2018, 8:35pm

The output size of a transposed convolution is given by:

o = (i -1)*s - 2*p + k + output_padding

Note that ConvTranspose layers come with an output_padding parameter, which defaults to 0.
The formulas are also shown in the documentation of PyTorch’s convolution layers.
(I used a slightly different notation for the Conv layer output. My formula can be simplified to the one shown in the docu though.)

micklexqg · March 5, 2018, 8:57am

nice, thank you very much!

Shivam_Saboo · September 22, 2018, 2:54pm

Might be a silly question, but I am finding it hard to visualize the above formula for 3d images.
I am working on 3d U-Net model for passing oct scans in the model and want to keep the volume same after applying 3d Conv. What modification should I make to the above formula?

If my kernel size is not square i.e suppose I use 3x3x1 and then 1x1x3 conv3D on my scans, how will I calculate padding separately for all three dimensions?

Time0o · April 22, 2019, 7:38am

Sorry for digging this up again, but as the formula depends on the input size of the convolution layer I’m not sure how to create a dilated convolution layer that will preserve arbitrary input dimensions. Is that even possible? (It seems to be in Keras).

SumanthMeenan · July 24, 2019, 9:08am

@Time0o Yes, its definitely possible with the help of padding.
To preserve input dimension we obviously need to use stride = 1
bcz if we use stride > 1, input image size will be almost half.

Next, in the above case lets say i am using dilation = 2, kernel size 3, padding = 2
0 = [256 + 4 - 3 - 2]/1 + 1 = 256
We are able to preserve the input dimension even after dilated convolution

metatl · July 26, 2019, 2:27am

A somewhat related, but different question:
In a u-net architecture, I am using conv2d and convtransposed2d for the down and up path. I wish to build a class that can take in arbitrary input image size. However, for many input image sizes the layers at the same depth of the down and up paths can have slightly different image size due to the way the implementations of conv2d and convtransposed2d are made. Is there a way to dynamically calculate the padding needed to make the down and up paths at the same depth have the same image size?

SumanthMeenan · July 26, 2019, 11:52am

Your stride is 2 or changing during upsampling and downsampling?

LazySleeper · August 13, 2019, 7:26pm

what if the input has non-equal height and width.

road · November 12, 2019, 8:54pm

You apply the formula separately on the height and the width.

Marcus_Brown · November 23, 2020, 11:28am

@ptrblck Hello, how do I keep the same size in either the height or width after a transpose convolution?

ptrblck · November 23, 2020, 11:31am

You could set the kernel size and stride for this spatial dimension to 1 as seen here:

x = torch.randn(1, 1, 24, 24)
conv = nn.ConvTranspose2d(1, 1, (2, 1), (2, 1))
out = conv(x)
print(out.shape)
> torch.Size([1, 1, 48, 24])

Kirti_Pandya · December 28, 2020, 6:07pm

In my case it is not working.
input shape : 1x28x28
nn.Conv2d(in_channels=1, out_channels=32, kernel_size=5, stride=2, dilation=1, padding=0)
Output shape : 32x12x12
According to your equation it should be [28 + 0 - 5 - 0] / 2 + 1= 12.5 which is not possible.

Could you please help.

ptrblck · December 29, 2020, 2:45am

The formula in the docs uses the floor operation, which would thus yield a spatial output shape of 12.

Kirti_Pandya · December 30, 2020, 10:52pm

Thank you very for the help.

Conv2d — PyTorch 1.7.0 documentation <-- Here I can not find any float equation. Could you please provide link if possible. I have watched CS231n Stanford lectures but could not find any float equation.

Thanking you.

ptrblck · December 31, 2020, 4:32am

I’m not sure what “float equation” means in this context, but the formula can be found in the Conv2d docs.
Specifically the Floor operation is used to calculate the output shapes (looks like an uppercase L on the left and flipped on the right hand side of the calculation).

neel_g · February 28, 2022, 8:04pm

For anyone looking for the transposed Convolution formula w/ Dilation to copy/paste:-

p = o - 1 - d*(k-1) + 2*p - [(o - 1)*s]

if they want to preserve the shape of input and output (i.e H_out, W_out = H_in, W_in)

This yields the padding with var names as pasted above - so one could just modify padding alone to achieve same sizes.

Ywandung-Lyou · August 30, 2022, 12:29am

Why there is ‘p’ in both sides of the equation?
@neel_g