[HELP] Could I process sequence of 2D images with nn.Conv3d?

Hello,

I have sequences of 2d images, each image is a single frame from a video.

With the length of each sequence given, the shape of the input is [batch_size, seq_length, c, h, w], so I want to process them with nn.Conv3d(). But I met some unexpected problem that I had to transpose the input data into the shape of [batch_size, c, seq_length, h, w], and finally the performance is poor (i.e. segmentation task of sequences of medical images).

So, Could I consider the sequences of images as 3D task and process them with conv3d?

Thanks in advance!

You are doing it right. Proper order is BxCxSXHxW.
3Dconv are really data greedy and they tend to overfit. You should have a really huge dataset to obtain good results. You can have a look at this paper.


They create networks modeling motion with 3dconvs at the beginning and 2d conv in the end.

BTW image2image translation is performed with encoder-decoder architectures, what are you exactly using?