Convolution for a batch of images

I was wondering how the convolution operation is implemented for a batch of N images, i.e. for a tensor of shape (N, C, H, W). Is it fully vectorized or is there a for-loop over N images? Also, if we have more filters than one, does it use one additional for-loop over the filters?

1 Like

There are a variety of convolution algorithms.
Some are calling an (implicit) im2col and use a matrix multiplication afterwards. Other might use an FFT or Winograd approach etc.

Also, if you are using torch.backends.cudnn.benchmark = True, the first iteration for a new input shape will run some benchmarking and select the fastest kernel.

Thanks ptrblck! In case of im2col (I couldn’t find the source code, but even if I find it I am afraid it is implemented in c++ and I won’t be able to understand it), if we pass a batch of N images, is it ran N times over each image using a for-loop? Also if we have more than one filters, will it require another for-loop? Or is im2col somehow cleverly vectorized so that it can handle tensors of shape (N, C, H, W) without for-loops (just a single matrix multiplication)?

im2col can be applied via unfold. There should be no loops in the matrix multiplication and the unfolding should be implicit for performance reasons.

The usage of loops might slow down the execution in a lot of use cases, but there might be use cases, where it would make sense to use a loop instead of a dense operation.