Can conv2d work with more than 4 dimension input?


As of now conv2d expect input of dimension [batch x channel x width x height], but how would I process a list of images (say of size M). As of now my input tensor has shape [M x batch x channel x width x height]. Pushing it through a conv2d layer cause errors.

I was able to work around it by un-stacking it into a python list of size M, calling conv2d over it in a for-loop and re-stacking them back. Let me know if there’s a more “torch-esque” way of doing it

much thanks ~


Do you want to model dependencies/correlations between these M images?

You can do it using conv3d. Your case like video processing. If you still want to use conv2d, so convert it to [batch x channel x height x ( M x width)] using view function

I want to compute attention over these M images.

essentially I’d like to convert [img1, img2, . . . imgM] into latent representations [lat1, lat2, …, latM]

I originally thought I can just create tensor [M x batch_image] which is essentially [M x batch x chan x W x H] once you expand it out.

and pass it through conv2d, which would result in [M x batch x latent]

no luck though xD

You could really try it with 3d convolutions and maybe try something like this.

Hi @evanthebouncy, I have a similar use case where I want to feed input of shape BXNXCXHXW, where B is batch size, N is number of images, and rest same. I am also trying to implement some attention mechanism, was wondering were you able to find a way to pass more than 4d input to conv2d? or some other solution maybe?