Input with different dimensions

I’m working with multiple matrix like inputs which has different height and width. For example assume that we have two input with shapes x_1: (400, 300) and x_2: (500, 250). My ambition is to be ably to concatenate the variables as x_c = concatenate((x_1, x_2)) for further processing in the network.

So my problem lies in making these two variable in the same shape. To make x_1 in the same shape as x_2 we need to upsample the first axis and downsample the second axis. I have found the following solution by using different kernel sizes. One can have multiple of these operation and not just down sample directly e.g. for adjusting the width one can have two stages, one kernel with size (1, 20) and another with size (1, 31). This may give a smoother down sampling.

My question is then if this is a valid approch? Will I lose too much information? Do you have any experience with this?

import torch.nn as nn
import torch

class convDimAdjust(nn.Module):
	def __init__(self):
		super(convDimAdjust, self).__init__()

		self.input_channel = 1
		self.convWidth = nn.Conv2d(in_channels = self.input_channel,
											out_channels = self.input_channel,
											stride = 1,
											kernel_size = (1, 51))

		self.convHeight = nn.ConvTranspose2d(in_channels = self.input_channel,
														out_channels = self.input_channel,
														kernel_size = (101, 1))

	def forward(self, x):

		x = self.convWidth(x)
		x = self.convHeight(x)

		return x

def main():
	# Map from (400, 300) to (500, 250) using CNN 

	m = convDimAdjust()

	# Batch, channel, h, w
	x = torch.ones((1, 1, 400, 300))	

	_ = m(x)


	# Prints: torch.Size([1, 1, 500, 250])


If you want to learn this upsampling/downsampling, I think it makes sense to do it this way.
If you want classic upsampling/downsampling (with no learnable parameters), there are functions like upsample available that use things like linear of bilinear sampling.

1 Like