For a given input of size (batch, channels, width, height) I would like to apply a 2-strided convolution with a single fixed 2D-filter to each channel of each batch, resulting in an output of size (batch, channels, width/2, height/2).
Using the group parameter of nn.functional.conv2d I came up with this solution:
I would like to apply the filter
fil = torch.tensor([ [0.5, 0.5], [-0.5, -0.5]])
to my input
X = torch.rand(32, 2048, 128, 128).
To this end, I add two dummy dimensions (out_channels and in_channels/groups) to my filter and expand the 0th dimension of my filter tensor to be equal to the number of channels of my input (in this case 2048). I’m keeping the 1st dimension unchanged since in_channels/groups will be equal to 1 by using groups=in_channels in nn.functional.conv2d.
fil_tensor = fil[None, None, :, :].expand(X.size(1), -1, -1, -1)
res = torch.nn.functional.conv2d( X, fil_tensor, stride=2, groups=X.size(1))
but I’m worried about the step where I expanded my filter, basically creating 2048 copies of redundant information. Is there a better way to do this?