[HELP] Could I process sequence of 2D images with nn.Conv3d?

MariosOreo · September 6, 2019, 8:58am

Hello,

I have sequences of 2d images, each image is a single frame from a video.

With the length of each sequence given, the shape of the input is [batch_size, seq_length, c, h, w], so I want to process them with nn.Conv3d(). But I met some unexpected problem that I had to transpose the input data into the shape of [batch_size, c, seq_length, h, w], and finally the performance is poor (i.e. segmentation task of sequences of medical images).

So, Could I consider the sequences of images as 3D task and process them with conv3d?

Thanks in advance!

JuanFMontesinos · September 6, 2019, 9:47am

You are doing it right. Proper order is BxCxSXHxW.
3Dconv are really data greedy and they tend to overfit. You should have a really huge dataset to obtain good results. You can have a look at this paper.

They create networks modeling motion with 3dconvs at the beginning and 2d conv in the end.

BTW image2image translation is performed with encoder-decoder architectures, what are you exactly using?