As of now conv2d expect input of dimension [batch x channel x width x height], but how would I process a list of images (say of size M). As of now my input tensor has shape [M x batch x channel x width x height]. Pushing it through a conv2d layer cause errors.
I was able to work around it by un-stacking it into a python list of size M, calling conv2d over it in a for-loop and re-stacking them back. Let me know if there’s a more “torch-esque” way of doing it
You can do it using conv3d. Your case like video processing. If you still want to use conv2d, so convert it to [batch x channel x height x ( M x width)] using view function
Hi @evanthebouncy, I have a similar use case where I want to feed input of shape BXNXCXHXW, where B is batch size, N is number of images, and rest same. I am also trying to implement some attention mechanism, was wondering were you able to find a way to pass more than 4d input to conv2d? or some other solution maybe?