As of now conv2d expect input of dimension [batch x channel x width x height], but how would I process a list of images (say of size M). As of now my input tensor has shape [M x batch x channel x width x height]. Pushing it through a conv2d layer cause errors.
I was able to work around it by un-stacking it into a python list of size M, calling conv2d over it in a for-loop and re-stacking them back. Let me know if there’s a more “torch-esque” way of doing it
much thanks ~
Do you want to model dependencies/correlations between these M images?
You can do it using conv3d. Your case like video processing. If you still want to use conv2d, so convert it to [batch x channel x height x ( M x width)] using
I want to compute attention over these M images.
essentially I’d like to convert [img1, img2, . . . imgM] into latent representations [lat1, lat2, …, latM]
I originally thought I can just create tensor [M x batch_image] which is essentially [M x batch x chan x W x H] once you expand it out.
and pass it through conv2d, which would result in [M x batch x latent]
no luck though xD
You could really try it with 3d convolutions and maybe try something like this.
Hi @evanthebouncy, I have a similar use case where I want to feed input of shape BXNXCXHXW, where B is batch size, N is number of images, and rest same. I am also trying to implement some attention mechanism, was wondering were you able to find a way to pass more than 4d input to conv2d? or some other solution maybe?