Surprisingly I have not found an answer to this question after looking around the internet. I am specifically interested in a 3d tensor. From doing my own experiments, I have found that when I create a tensor:
And then put a convolutional layer on it defined as follows:
The output is a (5,48,5) tensor. So, am I correct in assuming that for a 3d tensor in pytorch the middle number represents the number of channels?
Edit: It seems that when running a conv2d, the input dimension is the first entry in the tensor, and I need to make it a 4d tensor (1,48,5,5) for example. Now I am very confused…
Any help is much appreciated! (I also posteed this question on SO, but no one has answered and it’s sort of urgent
The first number represents the Batchsize (N) and for tensors holding data of a dimension of 1 or above the next dimension is usually referred to as channel-dimension. The following dimensions are commonly height, width and depth.
So for 2d data (images) you have a 4d tensor of NxCxHxW which you feed into a 2d conv layer.
Note that channels only exist for convolutional layers. Linear layers for example need a shape of N x #num_features
Thank you for your reply. What about if I want to apply a one dimensional convolution on a 2d image? Should I then say instead of just a 1d kernel of size 3, I should define it as a 2d kernel of size (3,1)?
You can choose between reshaping your tensor to NxCx(H*W), applying a 1d convolution and reshaping the result or use the (1,3) kernel in 2d convolution. Mathematically these approaches should be equivalent, but I would recommend the way using the 2d conv since you don’t have to pay attention on the memory order while reshaping.
Edit: just noticed they are not completely equal. If you reshape the image you have informations of the previous and the next row on your current rows border.
This is not the case for the 2d convolutional approach.
In order to do that with desired dimensions, I need to find a way to apply padding only to the left and right side of the image (not the top and bottom). Is there a way to do this easily in pytorch?
Sure pytorchs conv layer has a padding argument which expects a padding size.if passing an integer the padding will be applied on each side, but you could also pass a tuple containing a separate padding size per side.
Setting padding=(0,1) worked! Thank you so much for your help.
For Vision Models, prefer a Channels Last memory format to get the most out of your PyTorch models.