# nn.Upsample versus upsampling with transposed convolution

I’m trying to understand the difference between nn.Upsample and upsampling via transposed convolution. I wrote an example upsampling a 2x2 identity matrix both ways - can anyone explain why they are different?

https://gist.github.com/katerakelly/3fb565f172df3a371f5178d51e8f1039

For bilinear upsampling, it seems that each output should be a function of > 1 input, for smoothing purposes. Yet this is not what nn.Upsample does (it preserves original values) - why is this?

1 Like

nn.Upsample:

Args:
size (tuple, optional): a tuple of ints ([D_out], H_out, W_out) output sizes
scale_factor (int / tuple of ints, optional): the multiplier for the image height / width / depth
mode (string, optional): the upsampling algorithm: nearest | bilinear | trilinear. Default: nearest

I don’t see how this is an answer to my question, as I use “bilinear” mode?

You can follow the caffe document if you want to use convolution transpose as a bilinear upsampling layer:
http://caffe.berkeleyvision.org/doxygen/classcaffe_1_1BilinearFiller.html

``````layer {
name: "upsample", type: "Deconvolution"
bottom: "{{bottom_name}}" top: "{{top_name}}"
convolution_param {
kernel_size: {{2 * factor - factor % 2}} stride: {{factor}}
num_output: {{C}} group: {{C}}
pad: {{ceil((factor - 1) / 2.)}}
weight_filler: { type: "bilinear" } bias_term: false
}
param { lr_mult: 0 decay_mult: 0 }
}
``````

``````def make_bilinear_weights(size, num_channels):
''' Make a 2D bilinear kernel suitable for upsampling
Stack the bilinear kernel for application to tensor '''
factor = (size + 1) // 2
if size % 2 == 1:
center = factor - 1
else:
center = factor - 0.5
og = np.ogrid[:size, :size]
filt = (1 - abs(og[0] - center) / factor) * \
(1 - abs(og[1] - center) / factor)
print filt
filt = torch.from_numpy(filt)
w = torch.zeros(num_channels, 1, size, size)
for i in range(num_channels):
w[i, 0] = filt
return w

# Define a toy grid
x = np.array([[1, 2], [3, 4]], dtype=np.float32)
x = Variable(torch.from_numpy(x[np.newaxis, np.newaxis, :, :]))

# Upsample using Pytorch bilinear upsampling
out1 = F.upsample(x, None, 2, 'bilinear')

# Upsample using transposed convolution
# kernel size is 2x the upsample rate for smoothing
# output will need to be cropped to size
c = x.size(1)
out2 = F.conv_transpose2d(x, Variable(make_bilinear_weights(4, 1)), stride=2, padding=1, groups=c)

output:
Variable containing:
(0 ,0 ,.,.) =
1.0000  1.3333  1.6667  2.0000
1.6667  2.0000  2.3333  2.6667
2.3333  2.6667  3.0000  3.3333
3.0000  3.3333  3.6667  4.0000
[torch.FloatTensor of size 1x1x4x4]

Variable containing:
(0 ,0 ,.,.) =
0.5625  0.9375  1.3125  1.1250
1.1250  1.7500  2.2500  1.8750
1.8750  2.7500  3.2500  2.6250
1.6875  2.4375  2.8125  2.2500
[torch.FloatTensor of size 1x1x4x4]
``````
2 Likes

Thanks, I see why I had the wrong output size now.
Can anyone comment on why `out1` and `out2` are different??

The outputs are different between `conv_transpose2d` and `F.upsample` so the code for `make_bilinear_weights` is probably incorrect…

The major difference between nn.Upsample and nn.ConvTranspose2d is that nn.ConvTranspose2d has learnable weights because it has convolution kernels like nn.Conv2d, whereas nn.Upsample has no learnable weights and just applies a choosen interpolation algorithm ( ‘nearest’ , ‘linear’ , ‘bilinear’ , ‘bicubic’ or ‘trilinear’).
So I would say that nn.ConvTranspose2d is more powerful because it is “learnable”.

.