in keras, if the padding is set “same”, then the the shape of input and output will be same.
for example, in keras, if the input is 32
model.add(Conv2D(256, kernel_size=3, strides=1,
padding=‘same’, dilation_rate=(2, 2)))
the output shape will not change.
but in pytorch,
nn.Conv2d(256,256,3,1,1, dilation=2,bias=False),
the output shape will become 30.
so how to keep the shape of input and output same when dilation conv?
You could visualize it with some tools like ezyang’s convolution visualizer or calculate it with this formula:
o = output
p = padding
k = kernel_size
s = stride
d = dilation
o = [i + 2*p - k - (k-1)*(d-1)]/s + 1
In your case this gives o = [32 + 2 - 3 - 2*1]/1 +1 = [29] + 1 = 30
.
Now, you could set all your parameters and “solve” the equation for p
.
You will see, that p=2
will give you an output size of 32
.
thank you, i have just tried and solved my problem in terms of your answer.
another question, what is the similar formula for ConvTranspose2d?
The output size of a transposed convolution is given by:
o = (i -1)*s - 2*p + k + output_padding
Note that ConvTranspose
layers come with an output_padding
parameter, which defaults to 0
.
The formulas are also shown in the documentation of PyTorch’s convolution layers.
(I used a slightly different notation for the Conv
layer output. My formula can be simplified to the one shown in the docu though.)
nice, thank you very much!
Might be a silly question, but I am finding it hard to visualize the above formula for 3d images.
I am working on 3d U-Net model for passing oct scans in the model and want to keep the volume same after applying 3d Conv. What modification should I make to the above formula?
If my kernel size is not square i.e suppose I use 3x3x1 and then 1x1x3 conv3D on my scans, how will I calculate padding separately for all three dimensions?
Sorry for digging this up again, but as the formula depends on the input size of the convolution layer I’m not sure how to create a dilated convolution layer that will preserve arbitrary input dimensions. Is that even possible? (It seems to be in Keras).
@Time0o Yes, its definitely possible with the help of padding.
To preserve input dimension we obviously need to use stride = 1
bcz if we use stride > 1, input image size will be almost half.
Next, in the above case lets say i am using dilation = 2, kernel size 3, padding = 2
0 = [256 + 4 - 3 - 2]/1 + 1 = 256
We are able to preserve the input dimension even after dilated convolution
A somewhat related, but different question:
In a u-net architecture, I am using conv2d and convtransposed2d for the down and up path. I wish to build a class that can take in arbitrary input image size. However, for many input image sizes the layers at the same depth of the down and up paths can have slightly different image size due to the way the implementations of conv2d and convtransposed2d are made. Is there a way to dynamically calculate the padding needed to make the down and up paths at the same depth have the same image size?
Your stride is 2 or changing during upsampling and downsampling?
what if the input has non-equal height and width.
You apply the formula separately on the height and the width.
@ptrblck Hello, how do I keep the same size in either the height or width after a transpose convolution?
You could set the kernel size and stride for this spatial dimension to 1
as seen here:
x = torch.randn(1, 1, 24, 24)
conv = nn.ConvTranspose2d(1, 1, (2, 1), (2, 1))
out = conv(x)
print(out.shape)
> torch.Size([1, 1, 48, 24])
In my case it is not working.
input shape : 1x28x28
nn.Conv2d(in_channels=1, out_channels=32, kernel_size=5, stride=2, dilation=1, padding=0)
Output shape : 32x12x12
According to your equation it should be [28 + 0 - 5 - 0] / 2 + 1= 12.5 which is not possible.
Could you please help.
The formula in the docs uses the floor operation, which would thus yield a spatial output shape of 12.
Thank you very for the help.
Conv2d — PyTorch 1.7.0 documentation <-- Here I can not find any float equation. Could you please provide link if possible. I have watched CS231n Stanford lectures but could not find any float equation.
Thanking you.
I’m not sure what “float equation” means in this context, but the formula can be found in the Conv2d docs.
Specifically the Floor operation is used to calculate the output shapes (looks like an uppercase L on the left and flipped on the right hand side of the calculation).
For anyone looking for the transposed Convolution formula w/ Dilation to copy/paste:-
p = o - 1 - d*(k-1) + 2*p - [(o - 1)*s]
if they want to preserve the shape of input and output (i.e H_out, W_out = H_in, W_in
)
This yields the padding with var names as pasted above - so one could just modify padding alone to achieve same sizes.
Why there is ‘p’ in both sides of the equation?
@neel_g