Recommendations on how to downsample an image


I am new to PyTorch, and I am enjoying it so much, thanks for this project!

I have a question. Suppose I have an image of reduced size obtained through multiple layers of convolution and max-pooling. I need to down sample this image to the original size, and was wondering what are your recommendations for doing that?

I did read the documentation and tried to use the max-unpooling layer in the decoder part of the network. However, it seems that it needs the indices produced by max-pooling in the encoding part. Do I need to save every intermediate index of the encoding part to use in the decoding part?

Below is an example code of what I am trying to achieve:

class MyNet(Module):
    def __init__(self, pastlen, futurelen, cspace):
        super(MyNet, self).__init__()

        # problem dimensions
        P = pastlen
        F = futurelen
        C = 3 if cspace == "RGB" else 1

        self.encoder = Sequential(
            Conv2d(C, 64, kernel_size=3, padding=1),
            BatchNorm2d(64), ReLU(),
            Conv2d(64, 128, kernel_size=5, stride=5),
            BatchNorm2d(128), ReLU(),

        self.recurrence = Sequential(
            RNN(input_size=5*3*128, hidden_size=20*20*C, num_layers=1)

        self.decoder = Sequential(
            ConvTranspose2d(F*C, F*C, kernel_size=3),
            BatchNorm2d(F*C), ReLU(),
            MaxUnpool2d(2) # doesn't work, asks for indices...

Alternatively, I am down sampling the images with ConvTranspose2d itself, by choosing a stride of 2 every time. Please let me know if that is not a good idea.

At the end I will need to get the image with a specific size. What is a good operation for that?

I suppose you would like to upsample your image to the original size. If so, you could use ConvTranspose2d or Resize from torchvision.
There are mixed results with both approaches. I think the original UNet implementation used ConvTranspose2d.

1 Like

I am using ConvTranpose2d, not getting very results at the moment, but I am still debugging things.

Are there any utilities developed by the community to let’s say visualize the intermediate layers of a PyTorch Sequential object? I am doing it manually sending summaries to TensorBoard, but if there is an effort somewhere already that I could reuse, that would be great.