nn.Upsample versus upsampling with transposed convolution

I’m trying to understand the difference between nn.Upsample and upsampling via transposed convolution. I wrote an example upsampling a 2x2 identity matrix both ways - can anyone explain why they are different?

https://gist.github.com/katerakelly/3fb565f172df3a371f5178d51e8f1039

For bilinear upsampling, it seems that each output should be a function of > 1 input, for smoothing purposes. Yet this is not what nn.Upsample does (it preserves original values) - why is this?

1 Like

nn.Upsample:

Args:
size (tuple, optional): a tuple of ints ([D_out], H_out, W_out) output sizes
scale_factor (int / tuple of ints, optional): the multiplier for the image height / width / depth
mode (string, optional): the upsampling algorithm: nearest | bilinear | trilinear. Default: nearest

I don’t see how this is an answer to my question, as I use “bilinear” mode?

You can follow the caffe document if you want to use convolution transpose as a bilinear upsampling layer:
http://caffe.berkeleyvision.org/doxygen/classcaffe_1_1BilinearFiller.html

layer {
  name: "upsample", type: "Deconvolution"
  bottom: "{{bottom_name}}" top: "{{top_name}}"
  convolution_param {
    kernel_size: {{2 * factor - factor % 2}} stride: {{factor}}
    num_output: {{C}} group: {{C}}
    pad: {{ceil((factor - 1) / 2.)}}
    weight_filler: { type: "bilinear" } bias_term: false
  }
  param { lr_mult: 0 decay_mult: 0 }
}

I modified your code:

def make_bilinear_weights(size, num_channels):
    ''' Make a 2D bilinear kernel suitable for upsampling
    Stack the bilinear kernel for application to tensor '''
    factor = (size + 1) // 2
    if size % 2 == 1:
        center = factor - 1
    else:
        center = factor - 0.5
    og = np.ogrid[:size, :size]
    filt = (1 - abs(og[0] - center) / factor) * \
           (1 - abs(og[1] - center) / factor)
    print filt
    filt = torch.from_numpy(filt)
    w = torch.zeros(num_channels, 1, size, size)
    for i in range(num_channels):
        w[i, 0] = filt
    return w

# Define a toy grid
x = np.array([[1, 2], [3, 4]], dtype=np.float32)
x = Variable(torch.from_numpy(x[np.newaxis, np.newaxis, :, :]))

# Upsample using Pytorch bilinear upsampling
out1 = F.upsample(x, None, 2, 'bilinear')

# Upsample using transposed convolution
# kernel size is 2x the upsample rate for smoothing
# output will need to be cropped to size
c = x.size(1)
out2 = F.conv_transpose2d(x, Variable(make_bilinear_weights(4, 1)), stride=2, padding=1, groups=c)

output:
Variable containing:
(0 ,0 ,.,.) = 
  1.0000  1.3333  1.6667  2.0000
  1.6667  2.0000  2.3333  2.6667
  2.3333  2.6667  3.0000  3.3333
  3.0000  3.3333  3.6667  4.0000
[torch.FloatTensor of size 1x1x4x4]

Variable containing:
(0 ,0 ,.,.) = 
  0.5625  0.9375  1.3125  1.1250
  1.1250  1.7500  2.2500  1.8750
  1.8750  2.7500  3.2500  2.6250
  1.6875  2.4375  2.8125  2.2500
[torch.FloatTensor of size 1x1x4x4]
2 Likes

Thanks, I see why I had the wrong output size now.
Can anyone comment on why out1 and out2 are different??

The outputs are different between conv_transpose2d and F.upsample so the code for make_bilinear_weights is probably incorrect…

The major difference between nn.Upsample and nn.ConvTranspose2d is that nn.ConvTranspose2d has learnable weights because it has convolution kernels like nn.Conv2d, whereas nn.Upsample has no learnable weights and just applies a choosen interpolation algorithm ( ‘nearest’ , ‘linear’ , ‘bilinear’ , ‘bicubic’ or ‘trilinear’).
So I would say that nn.ConvTranspose2d is more powerful because it is “learnable”.

.