I’m trying to understand the difference between nn.Upsample and upsampling via transposed convolution. I wrote an example upsampling a 2x2 identity matrix both ways - can anyone explain why they are different?

https://gist.github.com/katerakelly/3fb565f172df3a371f5178d51e8f1039

For bilinear upsampling, it seems that each output should be a function of > 1 input, for smoothing purposes. Yet this is not what nn.Upsample does (it preserves original values) - why is this?

1 Like

nn.Upsample:

Args:
size (tuple, optional): a tuple of ints ([D_out], H_out, W_out) output sizes
scale_factor (int / tuple of ints, optional): the multiplier for the image height / width / depth
mode (string, optional): the upsampling algorithm: nearest | bilinear | trilinear. Default: nearest

I don’t see how this is an answer to my question, as I use “bilinear” mode?

longcw
(Longchen)
August 29, 2017, 11:52am
#4
You can follow the caffe document if you want to use convolution transpose as a bilinear upsampling layer:
http://caffe.berkeleyvision.org/doxygen/classcaffe_1_1BilinearFiller.html

```
layer {
name: "upsample", type: "Deconvolution"
bottom: "{{bottom_name}}" top: "{{top_name}}"
convolution_param {
kernel_size: {{2 * factor - factor % 2}} stride: {{factor}}
num_output: {{C}} group: {{C}}
pad: {{ceil((factor - 1) / 2.)}}
weight_filler: { type: "bilinear" } bias_term: false
}
param { lr_mult: 0 decay_mult: 0 }
}
```

I modified your code:

```
def make_bilinear_weights(size, num_channels):
''' Make a 2D bilinear kernel suitable for upsampling
Stack the bilinear kernel for application to tensor '''
factor = (size + 1) // 2
if size % 2 == 1:
center = factor - 1
else:
center = factor - 0.5
og = np.ogrid[:size, :size]
filt = (1 - abs(og[0] - center) / factor) * \
(1 - abs(og[1] - center) / factor)
print filt
filt = torch.from_numpy(filt)
w = torch.zeros(num_channels, 1, size, size)
for i in range(num_channels):
w[i, 0] = filt
return w
# Define a toy grid
x = np.array([[1, 2], [3, 4]], dtype=np.float32)
x = Variable(torch.from_numpy(x[np.newaxis, np.newaxis, :, :]))
# Upsample using Pytorch bilinear upsampling
out1 = F.upsample(x, None, 2, 'bilinear')
# Upsample using transposed convolution
# kernel size is 2x the upsample rate for smoothing
# output will need to be cropped to size
c = x.size(1)
out2 = F.conv_transpose2d(x, Variable(make_bilinear_weights(4, 1)), stride=2, padding=1, groups=c)
output:
Variable containing:
(0 ,0 ,.,.) =
1.0000 1.3333 1.6667 2.0000
1.6667 2.0000 2.3333 2.6667
2.3333 2.6667 3.0000 3.3333
3.0000 3.3333 3.6667 4.0000
[torch.FloatTensor of size 1x1x4x4]
Variable containing:
(0 ,0 ,.,.) =
0.5625 0.9375 1.3125 1.1250
1.1250 1.7500 2.2500 1.8750
1.8750 2.7500 3.2500 2.6250
1.6875 2.4375 2.8125 2.2500
[torch.FloatTensor of size 1x1x4x4]
```

2 Likes

Thanks, I see why I had the wrong output size now.
Can anyone comment on why `out1`

and `out2`

are different??

The outputs are different between `conv_transpose2d`

and `F.upsample`

so the code for `make_bilinear_weights`

is probably incorrect…

ISMAX
(Ismael EL ATIFI)
February 27, 2021, 7:49pm
#7
The major difference between nn.Upsample and nn.ConvTranspose2d is that nn.ConvTranspose2d has learnable weights because it has convolution kernels like nn.Conv2d, whereas nn.Upsample has no learnable weights and just applies a choosen interpolation algorithm ( ‘nearest’ , ‘linear’ , ‘bilinear’ , ‘bicubic’ or ‘trilinear’).
So I would say that nn.ConvTranspose2d is more powerful because it is “learnable”.

.