Load more than three channels

youssef_oumate · May 7, 2017, 10:13am

Can I load two images from my dataset and proceed them together through the network
at the same time, So instead of using conv3d with 3 input channels (R,G,B), I can use it with
9 input channels of the two images??
if yes, can you please give me some hints about how to do that !
Thank you

trypag · May 7, 2017, 12:58pm

If I understand your question correctly you want to concatenate two 2D RGB images together and process them at the same time.

You are talking about using conv3d, unless you have a 3d volume as input, I don’t see how it’s going to work.
Merging 2 RGB images, you would end up with one tensor of 6 channels.
Your question does not make any sense if it can’t be interpreted in real life. For example in medical imaging, it is possible to use multiple images coming from ultrasound, MRI, CT, etc if they come from the same patient/view, there is a real meaning in doing this (contrast/resolution/etc).
To conclude unless the two images represent a sequence (video, audio), I don’t think it’s interesting to do this. Think of how you are going to predict something during inference.
If you want to do this anyway, I would suggest using torch.cat to concatenate a tensor along a given dimension.

http://pytorch.org/docs/torch.html#torch.cat

youssef_oumate · May 7, 2017, 1:45pm

Thank you for your prompt reply
Actually I want to do this for a video sequence,
because when I proceed one single frame throw the network,
after the third maxpooling3D layer, one of the dimensions become
null (equal to zero) so I get this kind of error :
"output size is too small"
So I thought if I add more input channels the dimension will not reach 0.

trypag · May 7, 2017, 2:12pm

Ok, if you are working with video sequence, then it makes sense to work with 3d convolution. However I have no experience with this kind of problem.
About your error, this is usually a good start to write down your network on paper, start from the input volume, and find out the output size of each operation, so you can better understand where you are going, this is usually how I am doing it.

If you want to understand how a nn.Module transforms your data, wrap your input inside a Variable and forward it to the Module then you will see the dimensions of the ouput with size()