Custom upscaling of an image through a pretrained decoder network

Hello! So this is a bit of weird one…

I have a (b, c, 7, 7) tensor. For each “pixel” in this tensor I am going to put the (c, 1, 1) through a pretrained decoder to produce a (1, 4, 4) pixel image. In this way I intend to turn the (batch, channel, 7, 7) into (batch, 1, 28, 28). The problem is that I can not figure out how to do this upscaling through my custom function (the pretrained decoder).

perhaps an image will help:

To be clear I already have the trained function/decoder, I can feed the pixels into that decoder one by one and get the resulting 4 by 4. But I need to figure out a way to unfold the resulting tensors back into the larger image shape. (Perhaps some sort of nested unfold.) That is what I am having trouble with. If you could help out or simply point me in the right direction I would be very grateful.

Thanks in advance!

#%%
b = 1
c = 6
x_in = 7
y_in = 7

# place holder tensor
x = torch.rand((b, c, x_in, y_in), requires_grad=False) # 1, 6, 7, 7
x = x.view((b*x_in*y_in, 6)) # 49, 6
new =  decoder(x, outputDict)
print(new.shape) # 49, 1, 4, 4
new = new.view((b, 1, x_in*4, y_in*4))
print(new.shape) # 1, 1, 28, 28

This is where I am at now. Though I am not sure whether these reshapes will keep the pixels in there respective places…

The first view operation might interleave the values, so you should permute the channel dimension to the last dimension first:

x = torch.rand((b, c, x_in, y_in), requires_grad=False)
x = x.permute(0, 2, 3, 1).view(b*x_in*y_in, 6)

I’m not sure how the output is calculated, so it’s hard to tell, if the last view is working correctly.

1 Like