Convolution with several kernels on the same input image

pierremrg · August 17, 2021, 3:00pm

Hi all,

I’m performing cross-correlations between images and kernels. But it’s common that I need to perform a cross-correlation on an unique image with many different kernels.

All kernels have the same shape, so a simple solution is to batch them, and duplicate the image along the batch dimension. Doing so, I have the following tensors:

images batch of shape [n_kernels, n_channels, img_h, img_w]
kernels batch of shape [n_kernels, n_channels, krn_h, krn_w]

The cross-correlation operation works as expected, but I will run out of memory quickly: if my images’ size is 200x200px with 128 channels, if I have 50 kernels, I will need to store 200x200x128x50 floats in memory… which is huge.

Is there any way to perform my cross-correlation operations without duplicating my input image?

Thank you

ptrblck · August 17, 2021, 7:52pm

I’m unsure I understand the use case correctly and don’t know why repeating the image in the batch dimension would be necessary. Could you post a code snippet showing the desired results, please?

pierremrg · August 19, 2021, 8:00am

Sure, here is the code snippet I use:

	batch = 3
	channels = 2

	# Define a sample image (the **same** image is batched 3 times)
	f_img = torch.zeros((batch, channels, 5, 10))
	# pattern is a small 3x3 pattern for which I'm looking for with my cross-correlation
	# pattern content and position are arbitrary for the tests
	f_img[:, :, 2:5, 6:9] = pattern 
	f_img[:, :, 1:4, 1:4] = pattern * 2

	# Define 3 **different** kernel
	f_krn = torch.zeros((batch, channels, 3, 3))
	f_krn[0, :, :, :] = pattern
	f_krn[1, :, :, :] = pattern * 2
	f_krn[2, :, :, :] = pattern * 3

	print(f_img.shape, f_krn.shape)
	# Image shape: [3, 2, 5, 10] / Kernel shape: [3, 2, 3, 3]

	# Perform cross-correlation
	f_img = f_img.view(1, batch * channels, f_img.shape[2], f_img.shape[3])
	f_krn = f_krn.view(batch, channels, 3, 3)

	result = F.conv2d(f_img, f_krn, groups=batch)

	print(result)

Thanks

ptrblck · August 19, 2021, 8:34am

Thanks for the code snippet.
Your approach looks valid assuming you are using different images in f_img.
However, based on your previous description:

it seems that f_img would contain only a single unique image and you are repeating it in the batch dimension to use the view operation with the grouped conv approach.
Your grouped conv would create 3 groups (basically splitting the channels into the “images”) and would then apply each corresponding conv kernel to the input.
If so, you should be able to use a plain convolution to get the same result, if I’m not mistaken.
This workflow would use a single image and each filter would still be applied to it.
To do so, I’ve changed your code a bit to really repeat the input image in the batch dimension and initialized the filter kernels randomly.
Could you check, if I understand your use case correctly?

batch = 3
channels = 2

#define a sample image (the **same** image is batched 3 times)
f_img = torch.randn((1, channels, 5, 10)).repeat(batch, 1, 1, 1)

pattern = 1.
# Define 3 **different** kernel
f_krn = torch.randn((batch, channels, 3, 3))
f_krn[0, :, :, :] = pattern
f_krn[1, :, :, :] = pattern * 2
f_krn[2, :, :, :] = pattern * 3

print(f_img.shape, f_krn.shape)
# Image shape: [3, 2, 5, 10] / Kernel shape: [3, 2, 3, 3]

# Perform cross-correlation
f_img_ = f_img.view(1, batch * channels, f_img.shape[2], f_img.shape[3])
f_krn = f_krn.view(batch, channels, 3, 3)

result = F.conv2d(f_img_, f_krn, groups=batch)

# single image
out = F.conv2d(f_img[0:1], f_krn)
print((out == result).all())
> tensor(True)

pierremrg · August 19, 2021, 1:34pm

You understood the question perfectly!

In fact, I didn’t know it was possible to use a plain convolution with a different batch size between the image and the kernel. I was pretty sure that, having a batch size of 3 for my kernel, my image needed to have a batch size of 3 too.

(Just in case, your last line should be print((abs(out - result) < 1e-6).all()) - surely because of floating precision).

Thanks for your help, your answers are always useful, whatever the subject!

ptrblck · August 19, 2021, 5:39pm

Ah OK, yeah your explanation is right. The filter shapes is only defined by the number of kernels, the input channels, as well as the spatial size of the kernels and is thus independent from the batch size. This allows you to create models accepting different batch sizes (e.g. during training you could use a larger batch size while the deployed model could accept a single image).

Also yes, the better approach would be to use a small eps value or torch.allclose.