Conv3d — PyTorch 1.7.1 documentation Describes that the input to do convolution on 3D CNN is (N,Cin,D,H,W). Imagine if I have a sequence of images which I want to pass to 3D CNN. Am I right that:
- N → number of sequences (mini batch)
- Cin → number of channels (3 for rgb)
- D → Number of images in a sequence
- H → Height of one image in the sequence
- W → Width of one image in the sequence
The reason why I am asking is that when I stack image tensors: a = torch.stack([img1, img2, img3, img4, img5])
I get shape of a torch.Size([5, 3, 396, 247])
, so is it compulsory to reshape my tensor to torch.Size([3, 5, 396, 247])
so that number of channels would go first or it does not matter inside the Dataloader?
Note that Dataloader would add one more dimension automatically which would correspond to N.